Riding the data wave
Big data is revolutionising science. But as well as changing physics, chemistry and biology, it’s changing the nature of science itself. Institute researchers Wolf Reik and Stefan Schoenfelder and bioinformatics expert Simon Andrews reflect on how big data is re-shaping not only the way they work, but how they think. And we discover how bioinformatics – once considered a geeky corner of biology by some – has become central to scientific progress.
When Professor Wolf Reik, head of the Epigenetics programme, thinks about how big data has revolutionised his field he remembers the Swiss anatomist Karl Theiler. Theiler spent a lifetime creating ‘The House Mouse: Atlas of Embryonic Development’, painstakingly taking embryos at different stages of development and using staining and microscopy to identify each tissue type.
“Now, we take the same embryos and put them in a big machine which sequences up to 100,000 cells at a time. Through gene expression, it gives us an equally detailed atlas of development, but because we can now use multi-omics methods that link together different layers in a single cell – the epigenome and the transcriptome – we can ask much deeper questions about how these patterns arise mechanistically. That’s what we really want to know,” says Reik.
Despite being a relic of an earlier scientific age, Theiler’s atlas remains on Reik’s bookshelves: an illustration of how big data has transformed the scientific questions he can ask and an embodiment of how it’s reshaped the way his younger colleagues think.
Before big data, researchers thought and worked on single genes – how they were regulated and their role in development, health and ageing. Now, thanks to the recent developments in next-generation sequencing, the focus is firmly on the genome as a whole. “We can now look at 20,000 genes or 20,000 promoters and get huge amounts of information. The younger members of my group get excited about the whole genome and what it’s doing, whereas I was brought up in an era of asking what single genes do; it’s a fundamental difference in thinking,” says Reik.
Big data brings huge opportunities, but using techniques that generate massive amounts of increasingly complex data also presents huge computational challenges. So how do Reik and other researchers extract meaning from this deluge of data? The answer lies in bioinformatics, the science that has emerged at the intersection of biology, computer science and statistics.
Dr Simon Andrews, head of Bioinformatics, belongs to this new breed of experts. Since joining the Institute in 2001, he’s seen the group expand from two to 10 staff, many of whom have their roots in biology.
“Lots of people in my group were once biologists who happened to play with computers for fun. My mother was a primary school teacher. Sometimes she’d turn up with a computer that had been donated to the school, point me towards it and say ‘make this do something that I can take back into the classroom!’” Andrews recalls. “At university I built my own computers because we couldn’t afford to buy them, and when I started research we were beginning to get electronically-generated data.”
His PhD generated a respectable 1,000 bases of DNA sequence. Today, a single sample at the Institute yields 40 billion. “The fundamental change is that many experiments generate amounts of data that are impossible to understand without a computer. Before, computers were a nice add-on; now they are fundamental,” he says.
One of the Institute’s core facilities, the Bioinformatics group provides computational power and data analysis plus expert advice and bespoke development work. “What fires me up are computational problems that spring from biology,” says Andrews, and what researchers often need most are ways of making their data more accessible. Over several years, Andrews’ group has developed packages capable of visualising sequencing data sets with billions of data points. “These are unfathomable on their own, but we can turn them into billions of positions in a genome, and visualise what they look like,” he explains.
Like Reik and Andrews, Dr Stefan Schoenfelder has lived through the revolution wrought by next-generation sequencing and big data. “It changes the way you think and changes the way we work,” he says. “When I did my PhD 15 years ago I spent all my time doing experiments in the lab. Now it’s the analysis that takes the time.”
Schoenfelder is interested in how gene function and gene expression are controlled by non-coding bits of DNA known as regulatory sequences. In linear terms, genes and their regulatory elements may be some distance apart, so how the genome is organised in three dimensions is one of his key questions. “Whereas we used to look at individual examples, now it’s possible to address those questions genome wide. We can get a complete picture of all the interacting sequences in a cell,” he says. “When I came here after my PhD, it was something I thought might happen at the end of my career. That it’s happened so quickly is incredible.”
It also means that researchers need to learn how to interpret data, so the Institute’s Bioinformatics group makes a major difference. “The skills I was equipped with in my PhD are not enough anymore. It’s normal to keep learning in science, but this is a quantum leap,” says Schoenfelder. “In a competitive field you need to work rapidly. I often work with dedicated bioinformaticians because it’s almost impossible to be an expert in both.”
The next scientific revolution is anyone’s guess, but Schoenfelder is sure it will only underscore how much more we need to understand. “Sequencing and its impact on personalised medicine will continue to grow. High-resolution microscopy, observing live cells and even individual molecules, will be another game changer,” he concludes. “We make contributions all the time, but we know so little. That’s humbling – but it’s also very exciting to be a part of.”
This feature was written by Becky Allen for the Annual Research Report 2018.