The Childhood Cancer Data Lab (Data Lab) started in 2017 with a mission of empowering childhood cancer researchers through big data to accelerate the path to cures. Our latest project, the Single-cell Pediatric Cancer Atlas (ScPCA), is designed to provide broad access to the advanced cellular data acquired through a cutting-edge technique called single-cell profiling.
Read more below to understand the project’s aims, the ScPCA’s development, and the future!
What is Single-cell profiling and the Single-cell Pediatric Cancer Atlas?
In 2019, Alex’s Lemonade Stand Foundation (ALSF) funded ten awards for childhood cancer investigators working on single-cell profiling to create a publicly available atlas of single-cell pediatric cancer data. Single-cell profiling is a cutting-edge technique that makes it possible to examine individual cells and gain insight into the heterogeneity of cells in a tumor. Not every cell within a tumor is the same, so this technique helps us understand how certain cells influence cancer progression and treatment response. The ALSF-funded projects collected samples from different cancer types, helping generate data they could then share with the Data Lab.
The goal was to create a uniformly processed, open-source database available for researchers everywhere to discover. At launch, the ScPCA Portal contained 189 patient samples representing 28 tumor types and has been growing ever since.
Delivering this data to researchers efficiently was one challenge, but we also needed to ensure that researchers wouldn’t have to keep solving for the same issues – i.e., how to transfer, store, and process data for their use. Eliminating those steps for the user would avoid limiting the potential number of researchers who could use this resource and accomplishes the Portal’s intended goals.
To present the information to the most researchers possible in an immediately useful format, a web interface made the most sense. Now, we just had to start the building process.
What Went into Building the Single-cell Pediatric Cancer Atlas
While the research projects were in progress, we set out to create the ScPCA Portal, which officially launched in March 2022. The process that went into it though was complex and intended to solve the key problems presented in the paragraph above. Here’s a snapshot of what went into it .
Audience
To deliver the product quickly, we initially had to narrow the scope of our intended audience.
Within the broad category of the childhood cancer researcher community, we wanted to nail down a more specific audience description. At first, we decided to focus on researchers with more advanced data processing and programming skills. With that decision, our team could make more informed decisions about what features would be available in the portal at launch.
Since then, we have added features and enhancements for a broader audience. But more on that later!
Features at Launch
Like refine.bio, our online repository for transcriptome data, the ScPCA portal was designed to deliver researchers data they would have had to spend their own time collecting and processing for use. All data would be pre-processed too, freeing up researcher’s time and helping speed along their own project’s progress.
Another benefit of the portal is opening single-cell technology to a wider audience. Single-cell profiling is still a new technology and may not be readily available to all researchers. So if a researcher in one lab doesn’t have the capability to perform this function themselves, they can turn to the Data Lab for a resource that will potentially benefit their own project. Drawing on ALSF’s single-cell grants, the portal makes data from multiple tumor types available for researchers working across different kinds of pediatric cancer.
Like many other Data Lab initiatives, sharing is a central part of the design and functionality. ScPCA utilizes an open-source pipeline, meaning it's freely available for others to use with their own data. A critical step in the process of building the portal was to decide how exactly the Data Lab would:
- prepare this massive amount of data to be immediately useful to researchers
- ensure uniform processing for incoming samples as the portal continues to grow
- make sure other researchers could quickly, easily, and affordably process their own data too
Data Processing Pipeline
From a technical perspective, accomplishing the ScPCA’s goals also required analyzing which software could uniformly process this data quickly, efficiently, and affordably. This is where cost-saving efforts come into play.
The processing pipeline is all the steps involved in taking the raw data researchers provide to the Data Lab and turning it into the nice output users see in the ScPCA Portal. Software, like a popular tool known as Cell Ranger, can help accomplish that process. However, there can be downsides to this commonly used product:
- It uses lots of RAM, requiring more computing power
- It takes a long time
- This means renting costly computers for extended periods of time
- Ultimately, it becomes expensive to run due to the resources and time required
Most of the Data Lab’s intended audience has likely been exposed to Cell Ranger. But the Data Lab wanted to find a product that delivered similar results, while allowing for faster and cheaper data processing. After reviewing newer methods in comparison to Cell Ranger, they settled on a more computationally efficient software called Alevin-Fry.
What does Alevin-fry do?
It processes single-cell and single-nuclei RNA-sequencing data just like Cell Ranger, but with half the amount of memory required per sample and in at least half the time. That means both money and time saved.
With all their critical components identified, the audience, what features will be available and how to build the processing pipeline, we needed to validate the decisions we settled on.
Conducting Usability Evaluations
Usability evaluations help us understand what is functioning as intended and where improvements are needed. It’s a critical part of their process for any project. If you've ever tried a product or experience out and provided feedback, it’s the same idea.
The portal was first launched in beta – which means it’s in a testing phase and not fully live with all its features – and we invited researchers to try out the portal and provide feedback. A few key points came out of those evaluations:
- Redesigning the download modal to make it easier to understand.
- Making it clear that the available data are already uniformly processed.
- Addressing usability issues on Windows machines to ensure users with different operating systems get the same experience.
Each of these was integral to creating the best product at launch. We also invited external researchers to test out the data processing pipeline to evaluate the instructions for that experience. This initial set of community feedback was a key part of introducing the ScPCA Portal in 2022.
Enhancements After Launching
The ScPCA has the potential to serve the pediatric cancer research community and change the lives of children fighting cancer in many more ways! Since the launch, we have introduced new features and made a number of improvements to the Portal and pipeline. This has widened the Portal’s user base and made it possible for researchers to progress with their ScPCA data downloads.
For example, we allowed researchers to choose which file format they receive when downloading data. Downloads can be immediately used with two major software ecosystems for working with single-cell data. This means more users can avoid the time-consuming process of converting their ScPCA data to their preferred format and start working with it faster.
Our team also continues to enhance the pipeline’s capabilities to suit the needs of more researchers who may wish to use it. We made sure the pipeline could be easily extended and adapted to support more data types. This became an even more important feature when we opened a community call for contributions to the ScPCA Portal!
Community Involvement
Expanding the Portal through community contributions became a goal of the ScPCA. The Data Lab can now accept contributions from pediatric cancer researchers with existing single-cell and single-nuclei RNA-seq datasets that were not funded through the initial ScPCA grant. The first successful call for contributions launched in May 2023.
So far, two pediatric cancer research labs have shared their data on the Portal, which added 15 diffuse midline glioma samples and 4 neuroblastoma cell lines. Data contributors must use our open-source pipeline to ensure that all data on the Portal is uniformly processed. Each lab was awarded a one-time grant of unrestricted funds to be used for childhood cancer research for their contributions.
Community contributions will be integral to growing the Portal and putting more data in the hands of pediatric cancer researchers.
The Future of the Single-cell Pediatric Cancer Atlas
Despite the Portal’s overwhelmingly positive reception from the research community, the Data Lab knows firsthand from training hundreds of pediatric cancer researchers in analysis that making data available is not enough to change the future for children with cancer. We will continue to gather feedback from the community, develop new features, and make the Portal an even more valuable resource for the community. We even plan to launch an open science initiative to improve the utility of the ScPCA data.
In 2024, the Data Lab launched the OpenScPCA project. This is an open, collaborative project to analyze and improve the utility of the ScPCA data. The project is designed to allow researchers across the world to contribute analyses and rapidly share the results by making all analytical code publicly available in real-time. One major project goal is assigning cell types to all samples on the Portal. Adding cell types will provide researchers with far more information about the samples, allowing them to make more discoveries faster. This will be carried out openly, so others can see how analyses were done and make more informed decisions about using the ScPCA data for their own analyses. OpenScPCA will enhance a freely available data resource while providing collaborators with ample opportunities for learning and sharing.
The ScPCA and its future are all part of the Data Lab’s mission to empower pediatric cancer experts poised for the next big discovery with the knowledge, data, and tools to reach it.