DGSIST: Clustering spatial transcriptome data based on deep graph structure Infomax

  • 0College of Information Science Technology, Hainan Normal University, HaiKou City 571158, China; Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, HaiKou City 571158, China.

|

|

Summary

This summary is machine-generated.

The Deep Graph Structure Infomax (DGSI) model and DGSIST framework leverage spatial transcriptomics data for accurate cell clustering and spatial domain identification. This unsupervised approach enhances understanding of tissue organization and disease structures.

Area Of Science

  • Computational Biology
  • Bioinformatics
  • Genomics

Background

  • Spatial transcriptomics offers insights into tissue gene expression and structure but often underutilizes spatial data.
  • Graph neural networks present an opportunity to integrate spatial information with gene expression data.
  • Existing methods may not fully exploit the rich spatial context available in transcriptomic datasets.

Purpose Of The Study

  • To develop an unsupervised model, DGSI (Deep Graph Structure Infomax), for processing graph data from spatial transcriptomics.
  • To introduce the DGSIST framework, integrating DGSI with dimensionality reduction and clustering for accurate cell type identification.
  • To enhance the analysis of spatial transcriptomics data, improving cell clustering and spatial domain identification.

Main Methods

  • Developed the DGSI model using graph convolutional neural networks and an unsupervised learning approach to maximize mutual information between graph and node representations.
  • Integrated DGSI with Singular Value Decomposition (SVD) and k-means++ for the DGSIST unsupervised cell clustering framework.
  • Applied DGSIST to various spatial transcriptomics datasets across different tissue types and technologies.

Main Results

  • DGSIST accurately identifies cell types and spatial domains, outperforming existing methods.
  • The framework effectively eliminates batch effects without explicit correction.
  • Demonstrated robust performance across diverse tissue types and technological platforms.

Conclusions

  • DGSIST is a powerful unsupervised framework for cell clustering and spatial analysis using spatial transcriptomics data.
  • The model effectively captures local spatial information, leading to improved accuracy in identifying cellular structures.
  • DGSIST has significant potential for advancing the understanding of spatial organization in diseases like cancer.

Related Concept Videos

Cluster Sampling Method 01:20

11.7K

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

Protein Networks 02:26

2.3K
RNA-seq 03:21

9.8K

RNA sequencing, or RNA-Seq, is a high-throughput sequencing technology used to study the transcriptome of a cell. Transcriptomics helps to interpret the functional elements of a genome and identify the molecular constituents of an organism. Additionally, it also helps in understanding the development of an organism and the occurrence of diseases. 
Before the discovery of RNA-seq, microarray-based methods and Sanger sequencing were used for transcriptome analysis. However, while...

DNA Microarrays 02:34

17.2K

Microarrays are high-throughput and relatively inexpensive assays that can be automated to analyze large quantities of data at a time. They are used in genome-wide studies to compare gene or protein expression under two varied conditions, such as healthy and diseased states. Microarrays consist of glass or silica slides on which probe molecules are covalently attached through surface functionalization. Most commonly, the slides are prepared through the chemisorption of silanes to silica...

Time-Series Graph 00:54

4.3K

A time-series graph is a line graph with repeated measurements taken at successive intervals of time. It is also called a time series chart. To construct a time-series graph, one must look at both pieces of a paired data set. The horizontal axis is used to plot the time increments, and the vertical axis is used to plot the values of the variable that one is measuring. By using the axes in this way, each point on the graph will correspond to time and a measured quantity. The points on the graph...

Genomics 02:02

36.2K

Genomics is the science of genomes: it is the study of all the genetic material of an organism. In humans, the genome consists of information carried in 23 pairs of chromosomes in the nucleus, as well as mitochondrial DNA. In genomics, both coding and non-coding DNA is sequenced and analyzed. Genomics allows a better understanding of all living things, their evolution, and their diversity. It has a myriad of uses: for example, to build phylogenetic trees, to improve productivity and...