Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Next-generation Sequencing

Next-generation Sequencing

The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features....

Sanger Sequencing

Sanger Sequencing

DNA sequencing is a fundamental technique that is routinely used in the biological sciences. This method can be applied to a range of questions at different scales - from the sequencing of a cloned DNA fragment or the study of a mutation in a gene up to whole-genome sequencing. However, despite the widespread use of sequencing today, it was not until 1977 that Fredrick Sanger and his collaborators developed the chain-termination method to decode DNA sequences. It relies on the separation of a...

Maxam-Gilbert Sequencing

Maxam-Gilbert Sequencing

In the same year as the discovery of the Sanger sequencing method, another group of scientists, Allan Maxam and Walter Gilbert, demonstrated their chemical-cleavage method for DNA sequencing. The Maxam-Gilbert method relies on using different chemicals that can cleave the DNA sequence at specific sites, the separation of resulting DNA fragments of variable size using electrophoresis, and deciphering the DNA sequence from the resulting gel bands.
Challenges of the Maxam-Gilbert Method
The...

RNA-seq

RNA-seq

RNA sequencing, or RNA-Seq, is a high-throughput sequencing technology used to study the transcriptome of a cell. Transcriptomics helps to interpret the functional elements of a genome and identify the molecular constituents of an organism. Additionally, it also helps in understanding the development of an organism and the occurrence of diseases.
Before the discovery of RNA-seq, microarray-based methods and Sanger sequencing were used for transcriptome analysis. However, while...

Multi-species Conserved Sequences

Multi-species Conserved Sequences

Next-generation sequencing technologies have created large genomic databases of a variety of animals and plants. Ever since the human genome project was completed, scientists studied the genome of primates, mammals, and other phylogenetically distant living beings. Such large-scale studies have provided new insights into the evolutionary relationship between organisms.
Although the genome of each species varies greatly from each other, a few sequences are highly conserved. Such conserved...

Gene Evolution - Fast or Slow?

Gene Evolution - Fast or Slow?

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

SCOT+: a comprehensive software suite for single-cell alignment using optimal transport.

Bioinformatics advances·2026

Same author

An snRNA-seq aging clock for the fruit fly head sheds light on sex-biased aging.

Scientific reports·2026

Same author

Cell type-specific gene regulatory network inference from single cell transcriptomics with ctOTVelo.

bioRxiv : the preprint server for biology·2026

Same author

Systematic clustering alignment and feature characterization for single-cell omics using ACE-OF-Clust.

bioRxiv : the preprint server for biology·2026

Same author

Evaluation of antioxidant enzyme activity among Indian patients with type 2 diabetes mellitus.

Bioinformation·2026

Same author

Identifying and timing patient outcomes in clinician notes using large language models.

Artificial intelligence in medicine·2026

Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026

Same journal

SpaMFG: a Spatial Multi-omics Integration Method based on Feature Grouping.

Bioinformatics (Oxford, England)·2026

Same journal

CSCN: Inference of Cell-Specific Causal Networks Using Single-Cell RNA-Seq Data.

Bioinformatics (Oxford, England)·2026

Same journal

Sparse CCA-Based Mediation Analysis with High-Dimensional Exposures and Mediators.

Bioinformatics (Oxford, England)·2026

Same journal

Enhancing Cross-Context Generalization in Drug Perturbation Prediction with a Multimodal Conditional Diffusion Framework.

Bioinformatics (Oxford, England)·2026

Same journal

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Nov 23, 2025

Amplification, Next-generation Sequencing, and Genomic DNA Mapping of Retroviral Integration Sites

Amplification, Next-generation Sequencing, and Genomic DNA Mapping of Retroviral Integration Sites

Published on: March 22, 2016

FastSK: fast sequence analysis with gapped string kernels.

Derrick Blakely¹, Eamon Collins¹, Ritambhara Singh²

¹Department of Computer Science, University of Virginia, Charlottesville, VA, USA.

Bioinformatics (Oxford, England)

|December 31, 2020

Summary

This summary is machine-generated.

FastSK is a new algorithm that significantly speeds up gapped k-mer kernel calculations for DNA sequence analysis. It matches or exceeds existing methods while being much faster, enabling analysis of longer sequences and more mismatches.

More Related Videos

G2-seq: A High Throughput Sequencing-based Technique for Identifying Late Replicating Regions of the Genome

G2-seq: A High Throughput Sequencing-based Technique for Identifying Late Replicating Regions of the Genome

Published on: March 22, 2018

Novel Sequence Discovery by Subtractive Genomics

Novel Sequence Discovery by Subtractive Genomics

Published on: January 25, 2019

Related Experiment Videos

Last Updated: Nov 23, 2025

Amplification, Next-generation Sequencing, and Genomic DNA Mapping of Retroviral Integration Sites

Amplification, Next-generation Sequencing, and Genomic DNA Mapping of Retroviral Integration Sites

Published on: March 22, 2016

G2-seq: A High Throughput Sequencing-based Technique for Identifying Late Replicating Regions of the Genome

G2-seq: A High Throughput Sequencing-based Technique for Identifying Late Replicating Regions of the Genome

Published on: March 22, 2018

Novel Sequence Discovery by Subtractive Genomics

Novel Sequence Discovery by Subtractive Genomics

Published on: January 25, 2019

Area of Science:

Bioinformatics
Computational Biology
Machine Learning

Background:

Gapped k-mer kernels with support vector machines (gkm-SVMs) excel at predicting regulatory DNA sequences.
Existing gkm-SVM algorithms are computationally intensive due to slow kernel computation, limiting their scalability with feature length, mismatches, and alphabet size.

Purpose of the Study:

Introduce a fast and scalable algorithm for computing gapped k-mer string kernels.
Improve the efficiency and applicability of gkm-SVMs for sequence analysis tasks.

Main Methods:

Developed FastSK, a novel algorithm simplifying kernel formulation into independent counting operations.
Employed a fast Monte Carlo approximation for rapid convergence.
Applied FastSK to DNA transcription factor binding site prediction, medical named entity recognition, and protein remote homology detection datasets.

Main Results:

FastSK achieves comparable or superior performance to state-of-the-art gkm-SVMs on DNA binding site prediction.
Demonstrated significant speedups: ~100x average and ~800x for large feature lengths.
Outperformed recurrent and convolutional neural networks on DNA sequence tasks with low variance.
Matched or exceeded baseline performance on medical NER and protein homology detection datasets.

Conclusions:

FastSK offers a computationally efficient and scalable solution for gapped k-mer kernel calculations.
The algorithm enhances the practical application of gkm-SVMs across diverse sequence analysis domains.
FastSK provides a robust alternative to existing methods and deep learning approaches.