Harmonizing Data Sharing Efforts towards a Federated Data Model
Collaborators
Alejandro Sweet-Cordero, MD, University of California, San Francisco
Richard Gorlick, MD, MD Anderson Cancer Center
Lay Summary: A significant barrier to progress in studying pediatric cancer and in the design of novel precision medicine clinical trials is that it is not currently possible to visualize all research data in one place. This is because it is located in individual labs and in many separate databases, some of which are difficult to access due to procedural, regulatory, reporting and standardization issues. To maximize the success of pediatric cancer research, barriers to data access and sharing must be minimized. Through its support of pediatric cancer research, ALSF makes possible much-needed advances in the field, and the Crazy 8 Initiative is accelerating this work. As one important example, fusion-negative sarcomas are a diverse and understudied subset of pediatric cancers. To date, several different groups have performed comprehensive sequencing of osteosarcoma (OS) and embryonal rhabdomyosarcoma (ERMS). Our collaborators, Drs. Sweet-Cordero and Gorlick, propose to build a public resource that will allow for visualization and data sharing of all available ERMS and OS sequencing data in order to address this significant problem. Because we have experience with creating a similar public resource for the uniform processing and sharing of pediatric RNA sequencing data, we propose to: assist in harmonizing useful clinical metadata; offer standardization approaches for data dictionaries in the ERMS and OS public resource, link the data from the resource with other data from the same donors in other repositories and databases, and develop language for access and use agreements that are minimally burdensome on researchers.
Project Update 2020: As part of the Crazy8 Pilot Project, our collaborators at UCSF developed a public pediatric sarcoma database as a proof of concept to see if data for at least a single disease type could be centralized. In our Crazy8 Pilot Grant, Harmonizing data sharing efforts towards a federated data model, Treehouse ensured that the clinical information in the new resource was consistent and clear (we call it "harmonizing") and helped link the samples in the new pediatric sarcoma database to existing data from the same sample (or its donor) in other public databases. This harmonizing and linking will enable researchers to more easily access all the publicly available data for a particular sample or donor, which we hope will make the data and the new data sharing resource more impactful. While this pilot project started with a small number of samples, just 77, it has enabled us to identify challenges to this type of data sharing and has also allowed us to develop ideas about how to address these challenges for larger data sharing initiatives in the future.