Trove of brain cancer data available for study through Georgetown platform
One of the largest collections of brain cancer biomedical data in the United States is now available for study through the open access Georgetown Database of Cancer (G-DOC) platform.
The Repository for Molecular Brain Neoplasia Data (REMBRANDT) set contains information on 671 adult patients gathered from 14 institutions.
REMBRANDT includes genomic information from 261 samples of glioblastoma, 170 of astrocytoma, 86 tissues of oligodendroglioma and a cross-section of others that are either mixed or of an unknown subclass, Subha Madhavan, PhD, chief data scientist at Georgetown University Medical Center and director of Innovation Center for Biomedical Informatics at Georgetown Lombardi Comprehensive Cancer Center, told HemOnc Today.
There also is a cache of 13,000 data points on clinical outcomes of patients from whom these samples were derived.
The data were collected from 2004 through 2006 as part of an NCI-led study. The NCI transferred the information to Georgetown in 2015, where it now resides in G-DOC.
Madhavan and colleagues are developing tools to analyze and process the data. Meanwhile, researchers around the world can access the processed data in G-DOC and raw genomic data in the NCBI GEO repository as super series GSE108476.
HemOnc Today spoke with Madhavan about the importance of making such data sets freely available, the structure and organization of the REMBRANDT set, and the potential impact of researchers making use of these findings.
Question: How did this data set come about?
Answer: This project was developed by the NCI. Specifically, it was a project of the Glioma Molecular Diagnostics Initiative. There were 14 contributing institutions, including The University of Texas MD Anderson Cancer Center, Moffitt Cancer Center, Duke University, Henry Ford Cancer Institute, Dana-Farber Cancer Institute and the NCI. The database contains clinical information and genomic data for patients with brain tumors as part of this research study. We thought it would be useful to make this data set available to use the novel tools we were building within G-DOC.
Q : What are these novel tools you’ve built within G-DOC?
A : One tool allows users to explore DNA copy number data. We have comparison tools that allow users to look at the gene expression profiles of patients with glioblastoma versus other brain malignancies. The molecular signatures are very different, so we can explore the molecular or genomic differences, or expression patterns of groups within G-DOC through a user-friendly interface without having to download the data. This is done using a point and click tool. You don’t have to write code. You can analyze this information very easily.
Q : Why is a platform like this necessary?
A : These tools and this data set will really enable researchers who are designing new drugs for different types of cancer. Any given center will have just a handful of patients in a year for a certain tumor type, such as glioblastoma. Their collection might be 10 or 20 samples, or less. Because we have this unified collection, G-DOC allows investigators to study their genes of interest in this large collection of data with any given tumor and ask questions about the biomarkers involved. They can ask about differences in biomarkers among patients with good outcomes versus patients with poor outcomes. They can use this information to design the next generation of clinical trials. For example, asking a large data set if immune biomarkers are different in any way across tumor types or across different outcome groups can help us develop better immune therapy trials. Researchers could use it from a discovery standpoint before they design their trial. One also could do retrospective studies. You could retrospectively analyze the 641 patients and ask many scientific questions about specific biological pathways that your research might be focused on, like the EGFR signaling pathway.
Q : How are the data structured and organized?
A : They are organized using Amazon Web Services. We use cloud computing extensively as the back end for G-DOC for storing and analyzing large data sets..
Q : What makes this unique from other data sets?
A : There are a couple key differences. One is the extensive clinical information. Typically, publicly available patient-derived genomic data sets don’t have the associated clinical data from the same patients. This is a key component of REMBRANDT. The clinical and omics data are connected, making the analysis that much richer. The second is the tools that I described. Users do not need advanced programming skills to access and use this data set. Any scientist can point and click on a website to access this information.
Q : How often are clinicians and researchers are using it now?
A : The G-DOC system has thousands of users around the world. They can access 40-plus data sets that are present, including REMBRANDT. The hope is that making this rich data set available in G-DOC would engage more investigators, and that is happening. We are seeing an increase in the number of groups and researchers who are accessing the data for research and educational purposes.
Q : Can you give a sense of how widely this will be used as word spreads?
A : We expect at least a 3- to 5-fold increase. There is a massive open online course built around G-DOC. Thousands of students from around the world are using this platform to learn about translational research.
Q : Will you be updating the data set and adding information to it over time?
A : We hope to do so. The Glioma Molecular Diagnostics Initiative study is closed, so there is no update on that data set. When other collaborators share their information from other brain tumor data sets with us, we’ll update those study data sets in G-DOC.
Q : What is the potential impact on research and/or clinical outcomes?
A : It’s hard to say. This is a discovery data set. At the end of the day, you need clinical trials and validation studies for the biomarkers that scientists can discover with this data set. It will depend on the downstream studies, and how they are applied to novel therapies. This is where the real impact will come from. – by Rob Volansky
Gusev Y, et al. Sci Data. 2018; doi:10.1038/sdata.2018.158.
For more information:
Subha Madhavan, PhD, can be reached at Innovation Center for Biomedical Informatics, 2115 Wisconsin Ave. NW, Suite 110, Washington, D.C. 20007; email: firstname.lastname@example.org.
Disclosure: Madhavan reports no relevant financial disclosures.