Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

RNA-seq03:21

RNA-seq

10.5K
RNA sequencing, or RNA-Seq, is a high-throughput sequencing technology used to study the transcriptome of a cell. Transcriptomics helps to interpret the functional elements of a genome and identify the molecular constituents of an organism. Additionally, it also helps in understanding the development of an organism and the occurrence of diseases. 
Before the discovery of RNA-seq, microarray-based methods and Sanger sequencing were used for transcriptome analysis. However, while...
10.5K
Next-generation Sequencing03:00

Next-generation Sequencing

93.4K
The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features....
93.4K
Sanger Sequencing01:57

Sanger Sequencing

760.1K
DNA sequencing is a fundamental technique that is routinely used in the biological sciences. This method can be applied to a range of questions at different scales - from the sequencing of a cloned DNA fragment or the study of a mutation in a gene up to whole-genome sequencing. However, despite the widespread use of sequencing today, it was not until 1977 that Fredrick Sanger and his collaborators developed the chain-termination method to decode DNA sequences. It relies on the separation of a...
760.1K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Comprehensive review and assessment of multi-species splicing variant prediction: task-specific deep learning models and genomic foundation models.

Briefings in bioinformatics·2026
Same author

Graph-based RNA structural representation reveals determinants of subcellular localization.

Briefings in bioinformatics·2026
Same author

GatorSC: multi-scale cell and gene graphs with mixture-of-experts fusion for single-cell transcriptomics.

Briefings in bioinformatics·2026
Same author

GatorDuo: Global-Consistency Dual-Graph Refinement With Pseudo-Label Agreement for Spatial Transcriptomics.

bioRxiv : the preprint server for biology·2026
Same author

Modification-aware AI enables terminal chemical modifications for peptide design and discovers potent antimicrobials.

bioRxiv : the preprint server for biology·2026
Same author

Drug screening for α-synuclein aggregation inhibitors via multimodal graph neural network.

Briefings in bioinformatics·2026
Same journal

Deep learning model to predict COPD hospital admissions based on meteorological data: a medical meteorological forecast.

Frontiers in big data·2026
Same journal

Where diverse populations gather: transit accessibility and the spatial structure of social mixing.

Frontiers in big data·2026
Same journal

Inner layer security reinforcement for instant payment systems: a dual layer encryption-steganography evaluation in Brunei's digital payment context.

Frontiers in big data·2026
Same journal

Measuring the impact of virtualization and containerization on the environment when using GPUs for processing the AI models.

Frontiers in big data·2026
Same journal

Using artificial intelligence to improve governance and public services in Africa.

Frontiers in big data·2026
Same journal

Case count metric for comparative analysis of entity resolution results.

Frontiers in big data·2026
See all related articles

Related Experiment Video

Updated: Oct 4, 2025

Collection and Extraction of Saliva DNA for Next Generation Sequencing
06:58

Collection and Extraction of Saliva DNA for Next Generation Sequencing

Published on: August 27, 2014

39.5K

BigFiRSt: A Software Program Using Big Data Technique for Mining Simple Sequence Repeats From Large-Scale Sequencing

Jinxiang Chen1, Fuyi Li2,3,4, Miao Wang1

  • 1Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling, China.

Frontiers in Big Data
|February 4, 2022
PubMed
Summary
This summary is machine-generated.

BigFiRSt is a new Hadoop-based tool that efficiently merges short DNA sequence reads and identifies Simple Sequence Repeats (SSRs) using parallel processing. This accelerates analysis for non-model species in the era of big biological data.

Keywords:
HadoopSimple Sequence Repeats (SSR)big datanext-generation sequencingread pairs

More Related Videos

DNA Sequence Recognition by DNA Primase Using High-Throughput Primase Profiling
08:04

DNA Sequence Recognition by DNA Primase Using High-Throughput Primase Profiling

Published on: October 8, 2019

8.8K
G2-seq: A High Throughput Sequencing-based Technique for Identifying Late Replicating Regions of the Genome
06:40

G2-seq: A High Throughput Sequencing-based Technique for Identifying Late Replicating Regions of the Genome

Published on: March 22, 2018

5.9K

Related Experiment Videos

Last Updated: Oct 4, 2025

Collection and Extraction of Saliva DNA for Next Generation Sequencing
06:58

Collection and Extraction of Saliva DNA for Next Generation Sequencing

Published on: August 27, 2014

39.5K
DNA Sequence Recognition by DNA Primase Using High-Throughput Primase Profiling
08:04

DNA Sequence Recognition by DNA Primase Using High-Throughput Primase Profiling

Published on: October 8, 2019

8.8K
G2-seq: A High Throughput Sequencing-based Technique for Identifying Late Replicating Regions of the Genome
06:40

G2-seq: A High Throughput Sequencing-based Technique for Identifying Late Replicating Regions of the Genome

Published on: March 22, 2018

5.9K

Area of Science:

  • Genomics
  • Bioinformatics
  • Computational Biology

Background:

  • Simple Sequence Repeats (SSRs) are crucial genetic markers associated with human diseases.
  • Identifying SSRs traditionally requires complete genomes, which are often unavailable for non-model species.
  • Next-generation sequencing (NGS) generates vast amounts of data, posing big data challenges for SSR analysis.

Purpose of the Study:

  • To develop a novel big data software solution for efficient SSR identification from large-scale NGS data.
  • To address the limitations of traditional tools in handling massive datasets and merging short DNA read pairs.

Main Methods:

  • Developed BigFiRSt, a Hadoop-based software program utilizing parallel and distributed computing.
  • Integrated BigFLASH for merging overlapping short paired-end reads and BigPERF for SSR mining.
  • Leveraged big data technologies to enhance processing speed and scalability.

Main Results:

  • BigFiRSt significantly reduces execution times for read merging and SSR mining.
  • Demonstrated dramatic performance improvements on very large-scale DNA sequence datasets.
  • The software effectively handles the big data challenges inherent in NGS analysis.

Conclusions:

  • BigFiRSt leverages Hadoop technology for parallel and distributed processing of NGS data.
  • The tool is anticipated to be valuable for biological big data analysis, particularly for non-model organisms.
  • Enables efficient SSR discovery crucial for genetic research and disease association studies.