“Data is the lifeblood of science,” says Jaclyn Taroni, PhD, data scientist at the Childhood Cancer Data Lab.
By: Trish Adkins
When you think of childhood cancer research, you may imagine samples in test tubes and microscopes or a drug being tested in a clinic. And while childhood cancer research certainly happens in a biochem lab and in a clinic, it also happens inside the Childhood Cancer Data Lab (CCDL). Funded by ALSF, the CCDL team is working to harness the power of big data and use it to cure childhood cancer.
So what is big data?
“Data is the lifeblood of science,” says Jaclyn Taroni, PhD, data scientist at the CCDL, “Data can better equip researchers to ask the really important scientific questions and to do so efficiently and robustly, as they work towards cures for childhood cancer.”
The team at the CCDL takes large volumes of childhood cancer research data which includes things like oncogenes and mutations as well as known therapeutic agents and drugs and places it in formats that are easily mineable, widely available and broadly re-useable. And then, data scientists like Dr. Taroni train researchers on how to use the database and data to accelerate cures—literally!
By using the data tool, refine.bio, created by the CCDL, oncology researchers no longer have to wait weeks for useable data.
We interviewed Dr. Taroni about her work with the CCDL:
ALSF: What is a data scientist, exactly?
JT: This really depends on who you ask! Data science is such a broad field, but it has been described as a combination of statistics knowledge, programming skills and domain expertise (biology in my case). I think it is more accurate to call me, personally, a computational biologist.
At the CCDL, I analyze biological data to better understand the processes that underlie diseases like pediatric cancer. I figure out what data processing pipelines we should use in our refine.bio project, I design and execute experiments, and I work with our User Experience Designer, Deepa, to understand how we can meet the needs of the childhood cancer research community.
ALSF: When you were 10 years old, what did you want to be when you grew up?
JT: I wanted to be a lawyer because I thought that was a profession that would allow me to essentially argue for a living. Presenting a carefully reasoned argument was something that was valued in my household growing up. Being a scientist doesn't diverge too, too far from that in my opinion. It's about collecting evidence, synthesizing information and presenting it to others.
ALSF: How did you choose data science?
JT: When I went to graduate school, I thought I was going to be an immunologist. It just so happened that I ended up doing my first research rotation in a computational lab and I enjoyed the work. I took the position at the CCDL because of our ambitious-but-important mission and because it allows me to make an impact through our projects and training efforts.
ALSF: What challenges have you faced in your career?
JT: I think it has been challenging for me to take up space at times. This has manifested itself in a number of different ways for me personally: it can be difficult to say no to things, ask for what I need, or to feel comfortable putting forth ideas. I think there is also some uncertainty and failure inherent to being a scientist and that has sometimes felt like the result of personal failure. Working through this has been a big part of my story and there is certainly still work to be done.
ALSF: If cancer was cured, what would you be doing?
JT: I've been told that I make pretty good coffee. We've established a protocol where I weigh out the coffee beans every time we make a pot of coffee at the CCDL. So perhaps I would be a barista, but I think it's more likely that I would be a computational biologist working in a different disease area.
Come back to the ALSF blog next week to discover more about the CCDL's landmark work and future plans in an interview with its director, Casey Greene, PhD.
Launched in August 2017, the CCDL is the first big data lab of its kind dedicated to childhood cancer. The goal of the CCDL is to translate publicly available disease data into one constant format so that all researchers can access and understand it. The CCDL launched refine.bio, a tool which is designed to collect all publicly available childhood cancer data in one convenient location. The team also trains researchers in how to access and use the data in refine.bio, breaking down barriers in the lab and giving researchers greater insights which can lead to more targeted treatments for children with cancer. Learn more about the CCDL here.