Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Comparing Copy Number Variations and SNPs02:26

Comparing Copy Number Variations and SNPs

18.0K
Sequencing of the human genome has opened up several best-kept secrets of the genome. Scientists have identified thousands of genome variations that exist within a population. These variations can be a single nucleotide or a larger chromosomal variation.
Copy number variations or CNVs are the structural variations that cover more than 1kb of DNA sequence. The single nucleotide polymorphism (SNP), on the other hand, is a single nucleotide change or a point mutation that is found in more than 1%...
18.0K
Single Nucleotide Polymorphisms-SNPs01:05

Single Nucleotide Polymorphisms-SNPs

16.6K
A single nucleotide polymorphism or SNP is a single nucleotide variation at a specific genomic position in a large population. It is the most prevalent type of sequence variation found in the human genome. Point mutations that occur in more than 1% of the population qualify as SNPs. These are present once every 1000 nucleotides on an average in the human genome. Replacement of a purine with another purine (A/G) or a pyrimidine with another pyrimidine (C/T) is known as a transition. In contrast,...
16.6K
Sanger Sequencing01:57

Sanger Sequencing

760.5K
DNA sequencing is a fundamental technique that is routinely used in the biological sciences. This method can be applied to a range of questions at different scales - from the sequencing of a cloned DNA fragment or the study of a mutation in a gene up to whole-genome sequencing. However, despite the widespread use of sequencing today, it was not until 1977 that Fredrick Sanger and his collaborators developed the chain-termination method to decode DNA sequences. It relies on the separation of a...
760.5K
Genetic Variation01:25

Genetic Variation

833
Genetic variation is the diversity in DNA sequences found among individuals of the same species. This diversity is crucial for a species' survival because it helps organisms adapt to environmental changes. Genetic variation begins with fertilization, where an egg and sperm cell merge. Each of these cells carries 23 chromosomes, up to 46 in the fertilized egg. Chromosomes are long DNA strands that contain genes, the basic units of heredity.
Genes exist in different versions called alleles,...
833
Point and Frameshift Mutations01:30

Point and Frameshift Mutations

233
Point mutations are genetic alterations involving the change of a single nucleotide base pair in DNA. Depending on how the alteration affects protein synthesis, they can lead to various consequences.Point mutations fall into the following types:Silent mutations occur when a nucleotide change does not alter the amino acid sequence due to the redundancy of the genetic code. For instance, changing ACC to ACA still encodes threonine, leaving the protein function unaffected. This occurs because...
233
Next-generation Sequencing03:00

Next-generation Sequencing

93.7K
The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features....
93.7K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Data-driven hypothesis discovery from disease trajectories in multiple sclerosis.

Frontiers in immunology·2026
Same author

Simplified mechanical organs in aquatic plants are associated with the loss of expansin genes.

Plant physiology·2026
Same author

Multifocal cohort analysis unveils cell types associated with regional lymph node seeding in prostate cancer.

Genome medicine·2026
Same author

Low-count whole-body PET denoising with deep learning in a multicenter, multi-tracer and externally validated study.

European journal of nuclear medicine and molecular imaging·2025
Same author

smartSim: simulation of splice aware single cell smart-seq3 data.

Bioinformatics advances·2025
Same author

b-move: faster lossless approximate pattern matching in a run-length compressed index.

Algorithms for molecular biology : AMB·2025
Same journal

NanoporeDB: A Structural Resource Of Multimeric Protein Nanopores For Single-Molecule Sensing.

GigaScience·2026
Same journal

From the Brain Cell Atlas to Precision Neurology: A review of the application of AI-driven multi-omics in brain science.

GigaScience·2026
Same journal

Comparison of Deep Learning Approaches for Extreme Low-SNR Image Restoration.

GigaScience·2026
Same journal

ScopeViewer: A Browser-Based Solution for Visualizing Large Biological Images.

GigaScience·2026
Same journal

ChatMDV: Reducing Technical Barriers in Bioinformatics Analysis using Large Language Models.

GigaScience·2026
Same journal

ClusterGraph: a new tool for visualisation and compression of multidimensional data.

GigaScience·2026
See all related articles

Related Experiment Video

Updated: Oct 6, 2025

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER
14:06

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

15.4K

Halvade somatic: Somatic variant calling with Apache Spark.

Dries Decap1, Louise de Schaetzen van Brienen1, Maarten Larmuseau1

  • 1IDLab, Ghent University - imec, Technologiepark 126, B-9052 Ghent, Belgium.

Gigascience
|January 13, 2022
PubMed
Summary
This summary is machine-generated.

Halvade Somatic accelerates cancer variant detection using Apache Spark, significantly reducing computational time for DNA sequencing analysis. This Big Data framework offers scalable and reliable performance for researchers and clinicians.

Keywords:
Apache SparkGATK/Mutect2Strelka2somatic variant calling

More Related Videos

Detecting Somatic Genetic Alterations in Tumor Specimens by Exon Capture and Massively Parallel Sequencing
11:02

Detecting Somatic Genetic Alterations in Tumor Specimens by Exon Capture and Massively Parallel Sequencing

Published on: October 18, 2013

19.5K
Characterizing Mutational Load and Clonal Composition of Human Blood
07:58

Characterizing Mutational Load and Clonal Composition of Human Blood

Published on: July 11, 2019

7.5K

Related Experiment Videos

Last Updated: Oct 6, 2025

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER
14:06

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

15.4K
Detecting Somatic Genetic Alterations in Tumor Specimens by Exon Capture and Massively Parallel Sequencing
11:02

Detecting Somatic Genetic Alterations in Tumor Specimens by Exon Capture and Massively Parallel Sequencing

Published on: October 18, 2013

19.5K
Characterizing Mutational Load and Clonal Composition of Human Blood
07:58

Characterizing Mutational Load and Clonal Composition of Human Blood

Published on: July 11, 2019

7.5K

Area of Science:

  • Genomics and Bioinformatics
  • Computational Biology
  • Cancer Research

Background:

  • Accurate somatic variant detection is crucial for cancer treatment and research.
  • High sequencing depth is required for detecting low-frequency variants, leading to large data volumes and computational demands.
  • Current GATK best practices pipelines require extensive computing time for whole-genome sequencing data.

Purpose of the Study:

  • To introduce Halvade Somatic, a framework designed to reduce the runtime of somatic variant calling.
  • To leverage multi-node and multi-core platforms for scalable and efficient processing of DNA sequencing data.
  • To provide a reliable and accessible tool for analyzing tumor and matched normal samples.

Main Methods:

  • Utilizes Apache Spark for scalable I/O and parallel processing of data streams across CPU cores.
  • Integrates standard GATK best practices steps: BWA alignment, read sorting, duplicate marking, and base quality score recalibration.
  • Incorporates Mutect2 for somatic variant calling and supports Strelka2 as an alternative tool.

Main Results:

  • Achieved a 4.3x speedup on a single 36-core node, reducing runtime from 84.5 h to 19.5 h.
  • Demonstrated further speedup to 1.36 h using 16 nodes, an additional 14.4x improvement.
  • Supports both whole-genome sequencing (WGS) and whole-exome sequencing (WXS) data, with a Docker image for easy deployment.

Conclusions:

  • Halvade Somatic is the first somatic variant calling pipeline to effectively utilize Big Data processing platforms.
  • The framework offers reliable, scalable performance for somatic variant detection.
  • Source code is freely available, facilitating wider adoption in cancer research and clinical applications.