Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Next-generation Sequencing03:00

Next-generation Sequencing

96.0K
The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features....
96.0K
Sanger Sequencing01:57

Sanger Sequencing

767.6K
DNA sequencing is a fundamental technique that is routinely used in the biological sciences. This method can be applied to a range of questions at different scales - from the sequencing of a cloned DNA fragment or the study of a mutation in a gene up to whole-genome sequencing. However, despite the widespread use of sequencing today, it was not until 1977 that Fredrick Sanger and his collaborators developed the chain-termination method to decode DNA sequences. It relies on the separation of a...
767.6K
Maxam-Gilbert Sequencing01:05

Maxam-Gilbert Sequencing

11.9K
In the same year as the discovery of the Sanger sequencing method, another group of scientists, Allan Maxam and Walter Gilbert, demonstrated their chemical-cleavage method for DNA sequencing. The Maxam-Gilbert method relies on using different chemicals that can cleave the DNA sequence at specific sites, the separation of resulting DNA fragments of variable size using electrophoresis, and deciphering the DNA sequence from the resulting gel bands.
Challenges of the Maxam-Gilbert Method
The...
11.9K
RNA-seq03:21

RNA-seq

11.1K
RNA sequencing, or RNA-Seq, is a high-throughput sequencing technology used to study the transcriptome of a cell. Transcriptomics helps to interpret the functional elements of a genome and identify the molecular constituents of an organism. Additionally, it also helps in understanding the development of an organism and the occurrence of diseases. 
Before the discovery of RNA-seq, microarray-based methods and Sanger sequencing were used for transcriptome analysis. However, while...
11.1K
Multi-species Conserved Sequences02:51

Multi-species Conserved Sequences

4.4K
Next-generation sequencing technologies have created large genomic databases of a variety of animals and plants. Ever since the human genome project was completed, scientists studied the genome of primates, mammals, and other phylogenetically distant living beings. Such large-scale  studies have provided new insights into the evolutionary relationship between organisms.
Although the genome of each species varies greatly from each other, a few sequences are highly conserved. Such conserved...
4.4K
Gene Evolution - Fast or Slow?02:05

Gene Evolution - Fast or Slow?

3.3K
3.3K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

SCOT+: a comprehensive software suite for single-cell alignment using optimal transport.

Bioinformatics advances·2026
Same author

An snRNA-seq aging clock for the fruit fly head sheds light on sex-biased aging.

Scientific reports·2026
Same author

Cell type-specific gene regulatory network inference from single cell transcriptomics with ctOTVelo.

bioRxiv : the preprint server for biology·2026
Same author

Systematic clustering alignment and feature characterization for single-cell omics using ACE-OF-Clust.

bioRxiv : the preprint server for biology·2026
Same author

Evaluation of antioxidant enzyme activity among Indian patients with type 2 diabetes mellitus.

Bioinformation·2026
Same author

Identifying and timing patient outcomes in clinician notes using large language models.

Artificial intelligence in medicine·2026
Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026
Same journal

SpaMFG: a Spatial Multi-omics Integration Method based on Feature Grouping.

Bioinformatics (Oxford, England)·2026
Same journal

CSCN: Inference of Cell-Specific Causal Networks Using Single-Cell RNA-Seq Data.

Bioinformatics (Oxford, England)·2026
Same journal

Sparse CCA-Based Mediation Analysis with High-Dimensional Exposures and Mediators.

Bioinformatics (Oxford, England)·2026
Same journal

Enhancing Cross-Context Generalization in Drug Perturbation Prediction with a Multimodal Conditional Diffusion Framework.

Bioinformatics (Oxford, England)·2026
Same journal

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026
See all related articles

Related Experiment Video

Updated: Nov 23, 2025

Amplification, Next-generation Sequencing, and Genomic DNA Mapping of Retroviral Integration Sites
09:31

Amplification, Next-generation Sequencing, and Genomic DNA Mapping of Retroviral Integration Sites

Published on: March 22, 2016

18.0K

FastSK: fast sequence analysis with gapped string kernels.

Derrick Blakely1, Eamon Collins1, Ritambhara Singh2

  • 1Department of Computer Science, University of Virginia, Charlottesville, VA, USA.

Bioinformatics (Oxford, England)
|December 31, 2020
PubMed
Summary
This summary is machine-generated.

FastSK is a new algorithm that significantly speeds up gapped k-mer kernel calculations for DNA sequence analysis. It matches or exceeds existing methods while being much faster, enabling analysis of longer sequences and more mismatches.

More Related Videos

G2-seq: A High Throughput Sequencing-based Technique for Identifying Late Replicating Regions of the Genome
06:40

G2-seq: A High Throughput Sequencing-based Technique for Identifying Late Replicating Regions of the Genome

Published on: March 22, 2018

6.0K
Novel Sequence Discovery by Subtractive Genomics
09:40

Novel Sequence Discovery by Subtractive Genomics

Published on: January 25, 2019

8.9K

Related Experiment Videos

Last Updated: Nov 23, 2025

Amplification, Next-generation Sequencing, and Genomic DNA Mapping of Retroviral Integration Sites
09:31

Amplification, Next-generation Sequencing, and Genomic DNA Mapping of Retroviral Integration Sites

Published on: March 22, 2016

18.0K
G2-seq: A High Throughput Sequencing-based Technique for Identifying Late Replicating Regions of the Genome
06:40

G2-seq: A High Throughput Sequencing-based Technique for Identifying Late Replicating Regions of the Genome

Published on: March 22, 2018

6.0K
Novel Sequence Discovery by Subtractive Genomics
09:40

Novel Sequence Discovery by Subtractive Genomics

Published on: January 25, 2019

8.9K

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Machine Learning

Background:

  • Gapped k-mer kernels with support vector machines (gkm-SVMs) excel at predicting regulatory DNA sequences.
  • Existing gkm-SVM algorithms are computationally intensive due to slow kernel computation, limiting their scalability with feature length, mismatches, and alphabet size.

Purpose of the Study:

  • Introduce a fast and scalable algorithm for computing gapped k-mer string kernels.
  • Improve the efficiency and applicability of gkm-SVMs for sequence analysis tasks.

Main Methods:

  • Developed FastSK, a novel algorithm simplifying kernel formulation into independent counting operations.
  • Employed a fast Monte Carlo approximation for rapid convergence.
  • Applied FastSK to DNA transcription factor binding site prediction, medical named entity recognition, and protein remote homology detection datasets.

Main Results:

  • FastSK achieves comparable or superior performance to state-of-the-art gkm-SVMs on DNA binding site prediction.
  • Demonstrated significant speedups: ~100x average and ~800x for large feature lengths.
  • Outperformed recurrent and convolutional neural networks on DNA sequence tasks with low variance.
  • Matched or exceeded baseline performance on medical NER and protein homology detection datasets.

Conclusions:

  • FastSK offers a computationally efficient and scalable solution for gapped k-mer kernel calculations.
  • The algorithm enhances the practical application of gkm-SVMs across diverse sequence analysis domains.
  • FastSK provides a robust alternative to existing methods and deep learning approaches.