t-SNE and UMAP for single-cell multimodal omics
Single-cell RNA sequencing has enabled gene expression profiling at single-cell resolution and provided novel opportunities to study cellular heterogeneity, cellular differentiation and development. Emerging single-cell technologies assay multiple modalities such as transcriptome, genome, epigenome, and proteome at the same time. The joint analysis of multiple modalities has allowed to resolve subpopulations of cells at higher resolution, has helped to infer the “acceleration” of RNA dynamics and to extend time periods over which cell states can be predicted, and has linked dynamic changes in chromatin accessibiliy to transcription during cell-fate determination. A fundamental step in the analysis of high-dimensional single-cell data is their visualization in two dimensions. Arguably the most widely used nonlinear dimensionality reduction techniques are t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP). Currently, these techniques are applied to each modality one at a time, and separate views of the data need to be reconciled by manual inspection.
Van Hoan Do and Stefan Canzar introduce j-SNE and j-UMAP as their natural generalizations to the joint visualization of multimodal omics data. Their approach automatically learns the relative contribution of each modality to a concise representation of cellular identity that promotes discriminative features but suppresses noise. In a simulation study and on eight real datasets, j-SNE and j-UMAP produce unified embeddings that better agree with known cell types and that harmonize RNA and protein velocity landscapes.
A generalization of t-SNE and UMAP to single-cell multimodal omics
Do VH and Canzar S.
Genome Biol 22, 130 (2021). https://doi.org/10.1186/s13059-021-02356-5