Accelerating spliced alignment of long RNA sequencing reads using parallel maximal exact match retrieval

  • 0School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, China.

|

|

Summary

This summary is machine-generated.

A new parallel algorithm for maximal exact match (MEM) retrieval speeds up RNA sequencing (RNA-seq) data analysis. This accuracy-lossless approach significantly enhances computational efficiency for spliced alignment algorithms like uLTRA.

Area Of Science

  • Genomics and Bioinformatics
  • Computational Biology
  • Next-Generation Sequencing Data Analysis

Background

  • Third-generation sequencing technologies have revolutionized genomics, generating vast amounts of data.
  • The surge in sequencing data necessitates accurate and rapid algorithms for efficient processing.
  • Computational bottlenecks exist in spliced alignment algorithms, particularly for long RNA sequencing (RNA-seq) reads.

Purpose Of The Study

  • To develop a parallelized, accuracy-lossless algorithm for maximal exact match (MEM) retrieval.
  • To address the computational bottleneck in the uLTRA spliced alignment algorithm.
  • To accelerate the analysis of large-scale RNA-seq datasets.

Main Methods

  • Developed a multi-threaded algorithm for concurrent processing of multiple reads.
  • Implemented serialization of the index for reusable MEM retrieval, reducing startup time.
  • Integrated the parallel MEM retrieval algorithm into the uLTRA pipeline.

Main Results

  • The parallel algorithm demonstrated significant improvements in runtime, speedup, throughput, and memory usage.
  • Achieved a 10.78x speedup on the largest human dataset, enhancing large-scale throughput.
  • The integrated uLTRA pipeline achieved a 4.99x speedup compared to multi-process, single-threaded execution.

Conclusions

  • Parallelized strategies for MEM retrieval effectively address computational challenges in spliced alignment.
  • The developed algorithm significantly enhances the performance of RNA-seq data analysis pipelines.
  • This work showcases the advantages of parallel processing for handling large genomic datasets.

Related Concept Videos

RNA-seq 03:21

9.9K

RNA sequencing, or RNA-Seq, is a high-throughput sequencing technology used to study the transcriptome of a cell. Transcriptomics helps to interpret the functional elements of a genome and identify the molecular constituents of an organism. Additionally, it also helps in understanding the development of an organism and the occurrence of diseases. 
Before the discovery of RNA-seq, microarray-based methods and Sanger sequencing were used for transcriptome analysis. However, while...

RACE - Rapid Amplification of cDNA Ends 02:35

6.3K

Rapid Amplification of cDNA Ends, or RACE, is one of the most effective methods to obtain a full-length cDNA from an mRNA sequence between a known internal region to the unknown sequence at the 5’ or 3’ end. The unknown region is cloned in the cDNA by a gene-specific primer that binds the known end, and a hybrid primer that attaches a predefined anchor sequence to the unknown end of the cDNA. The sequence in between is amplified by PCR with an anchor primer and a gene-specific...

Maxam-Gilbert Sequencing 01:05

11.2K

In the same year as the discovery of the Sanger sequencing method, another group of scientists, Allan Maxam and Walter Gilbert, demonstrated their chemical-cleavage method for DNA sequencing. The Maxam-Gilbert method relies on using different chemicals that can cleave the DNA sequence at specific sites, the separation of resulting DNA fragments of variable size using electrophoresis, and deciphering the DNA sequence from the resulting gel bands.
Challenges of the Maxam-Gilbert Method
The...

Sanger Sequencing 01:57

754.1K

DNA sequencing is a fundamental technique that is routinely used in the biological sciences. This method can be applied to a range of questions at different scales - from the sequencing of a cloned DNA fragment or the study of a mutation in a gene up to whole-genome sequencing. However, despite the widespread use of sequencing today, it was not until 1977 that Fredrick Sanger and his collaborators developed the chain-termination method to decode DNA sequences. It relies on the separation of a...