Canzar Lab - Research
Finding the Invisible, and Improving Accuracy, in Genetic Transcription
The latest RNA-sequencing techniques create an avalanche of fragmented genetic data. Our lab uses advanced algorithms to reassemble that information and detect missing patterns.
Accurately reconstructing the basic protein patterns in the genetic blueprints of life can help us understand cellular biology and identify specific patterns in genes that produce diseases. The preferred method of converting samples of cellular RNA into genetic data is RNA sequencing (RNA-seq), which creates hundreds of millions of small data fragments, known as reads.
Piecing together the resulting mass of highly fragmented data into the full-length molecular sequences, known as transcripts, is difficult. Together with colleagues at Freie Universität Berlin and Centrum Wiskunde & Informatica, we have developed a way to use advanced algorithms to reconstruct RNA sequences more accurately and comprehensively than other current methods.
In RNA-seq, scientists break the content of a cell into tens of millions to hundreds of millions of short sequences of 100 to 150 bases, called reads. Reads can be pieced together to reconstruct an RNA transcript. But piecing the reads back together in correct combinations is difficult, because they often contain little information about where they belong.
“It is a big jigsaw puzzle. You want to figure out the big picture—these RNAs—but you only have these small fragments, these puzzle pieces.”