McSplicer: a probabilistic model for estimating splice site usage from RNA-seq data
Alternative splicing removes intronic sequences from pre-mRNAs in alternative ways to produce different forms (isoforms) of mature mRNA. The composition of expressed transcripts gives specific functionalities to cells in a particular condition or developmental stage. In addition, a large fraction of human disease mutations affect splicing and lead to aberrant mRNA and protein products. Current methods that interrogate the transcriptome based on RNA-seq either suffer from short read length when trying to infer full-length transcripts, or are restricted to predefined units of alternative splicing that they quantify from local read evidence.
Instead of attempting to quantify individual outcomes of the splicing process such as local splicing events or full-length transcripts, Stefan Canzar and his team, in collaboration with Heejung Shim at University of Melbourne, propose to quantify alternative splicing using a simplified probabilistic model of the underlying splicing process. The model is based on the usage of individual splice sites and can generate arbitrarily complex types of splicing patterns. In the implementation, McSplicer, they estimate the parameters of their model using all read data at once and they demonstrate in their experiments that this yields more accurate estimates compared to competing methods. The model is able to describe multiple effects of splicing mutations using few, easy to interpret parameters, as they illustrate in an experiment on RNA-seq data from autism spectrum disorder patients.
McSplicer: a probabilistic model for estimating splice site usage from RNA-seq data.
Alqassem I, Sonthalia Y, Klitzke-Feser E, Shim H, Canzar S.
Bioinformatics. 2021 Jan 30:btab050. doi: 10.1093/bioinformatics/btab050