A New Approach to Large Multiomics Data Integration
View abstract on PubMed
Summary
This summary is machine-generated.We introduce a novel deep learning approach for integrating large, multi-omics datasets. This method effectively extracts features and fuses diverse biological data, overcoming previous computational limitations.
Area Of Science
- Computational Biology
- Bioinformatics
- Systems Biology
Background
- High-dimensional omics and imaging data pose challenges for feature extraction and data mining.
- Existing nonlinear dimensionality reduction methods like t-SNE and UMAP excel at visualization but struggle with very large datasets.
- Integrating multi-omics data is crucial for a holistic understanding of systems biology.
Purpose Of The Study
- To develop a new approach for extracting, mining, and integrating large multi-omics datasets.
- To overcome the limitations of current algorithms in handling prohibitively large data.
Main Methods
- Utilized deep learning on subsampled nonlinear dimensionality reduction (t-SNE and UMAP).
- Applied the method to extract features from mass spectrometry imaging and chromosome conformation capture data.
- Demonstrated learning embeddings from fused omics data, projecting metabolomics into a reduced transcriptomics representation.
Main Results
- Successfully extracted features from large, complex datasets previously considered too large.
- Enabled the fusion of different omics data through learned embeddings.
- Showcased the projection of metabolomics data into a reduced transcriptomics space.
Conclusions
- The proposed deep learning approach effectively integrates large and multi-omics data.
- This method advances the analysis of complex biological datasets, enabling new insights in systems biology.
- Facilitates a more comprehensive understanding by fusing diverse biological information streams.
Related Concept Videos
Genomics is the science of genomes: it is the study of all the genetic material of an organism. In humans, the genome consists of information carried in 23 pairs of chromosomes in the nucleus, as well as mitochondrial DNA. In genomics, both coding and non-coding DNA is sequenced and analyzed. Genomics allows a better understanding of all living things, their evolution, and their diversity. It has a myriad of uses: for example, to build phylogenetic trees, to improve productivity and...
A proteome is the entire set of proteins that a cell type produces. We can study proteomes using the knowledge of genomes because genes code for mRNAs, and the mRNAs encode proteins. Although mRNA analysis is a step in the right direction, not all mRNAs are translated into proteins.
Proteomics is the study of proteomes' function. It involves the large-scale systematic study of the proteome to denote the protein complement expressed by a genome. Scientist Mark Wilkins coined the term...
RNA sequencing, or RNA-Seq, is a high-throughput sequencing technology used to study the transcriptome of a cell. Transcriptomics helps to interpret the functional elements of a genome and identify the molecular constituents of an organism. Additionally, it also helps in understanding the development of an organism and the occurrence of diseases.
Before the discovery of RNA-seq, microarray-based methods and Sanger sequencing were used for transcriptome analysis. However, while...
The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features....

