Skip to main navigation Skip to search Skip to main content

A unified framework for integrating spatial and single-cell transcriptomics data using deep generative models

  • Xiaomeng WAN

Student thesis: Doctoral thesis

Abstract

The rapid emergence of spatial transcriptomics (ST) technologies is revolutionizing our understanding of tissue spatial architecture and their biology. Spatial transcriptomics (ST) technologies enable the measurement of transcriptomes while retaining spatial information, which offers an unprecedented chance to uncover transcriptomic landscapes on tissues. These transcriptomics datasets have provided new insights into tissue composition/function and accelerated the capacity to elucidate the development of healthy tissue and tumor microenvironment of cancers. Current ST technologies based on either next-generation sequencing (seq-based approaches) or fluorescence in situ hybridization (image-based approaches), while providing hugely informative insights, remain unable to provide spatial characterization at transcriptome-wide single-cell resolution, limiting their usage in resolving detailed tissue structure and detecting cellular communications. Seq-based approaches, such as 10x Visium [1] and Slide-seq [2], can detect transcriptome-wide gene expression within spatial spots, but each spot often contains multiple cells. Therefore, the resolution of present seq-based approaches do not achieve single-cell resolution, which limits their usage in resolving detailed tissue structure and in characterizing cellular communications (e.g., identifying ligand-receptor interactions [3]). Image-based approaches such as seqFISH [4] and MERFISH [5] achieve single-cell resolution but are limited to profiling panels of tens to hundreds of genes per sample, leaving the majority of the transcriptome unmeasured. Users of these image-based methods need well-defined biological hypotheses to design an appropriate and useful gene panel, and it is unlikely to generate incidental discoveries in this scenario. On the other hand, single-cell RNA sequencing (scRNA-seq) characterizes the whole transcriptome of individual cells within a given organ, providing remarkable opportunities for broad and deep biological investigations of diverse cellular behaviors [6, 7, 8]. However, scRNA-seq does not capture the spatial distribution of cells due to samples having to undergo tissue dissociation [9]. As spatial information is so critical to understanding communication between cells, many related scientific questions related to cellular communication cannot be fully addressed by scRNA-seq alone [10]. Ideally, the integration of single-cell and ST data should allow us to characterize the spatial distribution of the whole transcriptome at single-cell resolution by combining their complementary information. However, existing integration methods are far from satisfactory in real data analysis [11]. Deconvolution methods are applied to seq-based ST data, they only estimate the proportions of cell types in each spatial spot but cannot achieve single-cell resolution. For image-based ST data, methods developed to infer unmeasured gene expressions are not sufficiently accurate, especially when ST expression data are sparse [11]. In this thesis, we propose a unified framework, SpaitalScope, to integrate scRNA-seq reference data and ST data, resulting in the characterization of the spatial distribution of the whole transcriptome at single-cell resolution. By leveraging the deep generative model to approximate the distribution of gene expressions accurately from the scRNA-seq reference data, SpatialScope can resolve the spot-level data composed of multiple cells to single-cell resolution when it is applied to seq-based ST data, corrects low-accuracy genes for high resolution spatial data, such as Slide-seq and infers transcriptome-wide expression levels for image-based ST data. We demonstrate the utility of SpatialScope through comprehensive simulation studies and then apply it to real data from both seq-based and imagebased ST approaches. SpatialScope provides a spatial characterization of tissue structures at transcriptome-wide single-cell resolution, greatly facilitating the downstream analysis of ST data, such as detection of cellular communication by identifying ligand-receptor interactions from seq-based ST data, localization of cellular subtypes, and detection of spatially differently expressed genes.
Date of Award2023
Original languageEnglish
Awarding Institution
  • The Hong Kong University of Science and Technology
SupervisorCan YANG (Supervisor)

Cite this

'