Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Wilcoxon Signed-Ranks Test for Matched Pairs01:09

Wilcoxon Signed-Ranks Test for Matched Pairs

231
The Wilcoxon signed-rank test for matched pairs evaluates the null hypothesis by combining the ranks of differences with their signs. It essentially tests whether the median of the differences in a population of matched pairs is zero. Since the test incorporates more information than the sign test, it generally yields more trustable conclusions. This test also does not require the data to follow a normal distribution, but two conditions must be met for it to be applicable: (1) the data must...
231
Kendall's Coefficient of Concordance01:20

Kendall's Coefficient of Concordance

562
Kendall's Coefficient of Concordance (W), also known as Kendall's W, is a non-parametric statistical measure used to assess the agreement or concordance between multiple raters or judges when they rank a set of items. It is often used when you have ordinal data (ranks) and you want to see if there is consistency or consensus among the raters. It is widely applied in research areas such as psychology, medicine, and social sciences, where multiple judges are asked to rank or rate subjects...
562
Comparing Copy Number Variations and SNPs02:26

Comparing Copy Number Variations and SNPs

18.0K
Sequencing of the human genome has opened up several best-kept secrets of the genome. Scientists have identified thousands of genome variations that exist within a population. These variations can be a single nucleotide or a larger chromosomal variation.
Copy number variations or CNVs are the structural variations that cover more than 1kb of DNA sequence. The single nucleotide polymorphism (SNP), on the other hand, is a single nucleotide change or a point mutation that is found in more than 1%...
18.0K
Evolutionary Relationships through Genome Comparisons02:54

Evolutionary Relationships through Genome Comparisons

6.2K
Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...
6.2K
Comparing Mitochondrial, Chloroplast, and Prokaryotic Genomes02:16

Comparing Mitochondrial, Chloroplast, and Prokaryotic Genomes

14.0K
The present-day mitochondrial and chloroplast genomes have retained some of the characteristics of their ancestral prokaryotes and also have acquired new attributes during their evolution within eukaryotic cells. Like prokaryotic genomes, mitochondrial and chloroplast genomes neither bind with histone-like proteins nor show complex packaging into chromosome-like structures, as observed in eukaryotes. Unlike mitotic cell divisions observed in eukaryotic cells, mitochondria and chloroplasts...
14.0K
DNA Microarrays02:34

DNA Microarrays

18.8K
Microarrays are high-throughput and relatively inexpensive assays that can be automated to analyze large quantities of data at a time. They are used in genome-wide studies to compare gene or protein expression under two varied conditions, such as healthy and diseased states. Microarrays consist of glass or silica slides on which probe molecules are covalently attached through surface functionalization. Most commonly, the slides are prepared through the chemisorption of silanes to silica...
18.8K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

ASIC1a promotes alveolar epithelial apoptosis and acute lung injury through repression of the LncRNA00178/miR-466b-3p/Gucy1b1 ceRNA network.

International immunopharmacology·2026
Same author

A Single-cell Spatiotemporal Manifold of Tissue Morphology and Dynamics.

bioRxiv : the preprint server for biology·2025
Same author

Single photon γ-ray imaging with high energy and spatial resolution perovskite semiconductor for nuclear medicine.

Nature communications·2025
Same author

The grand biological universe: A comprehensive geometric construction of genome space.

Innovation (Cambridge (Mass.))·2025
Same author

An Electron Beam Irradiation Postsynthetic Lanthanide-Based Metal-Organic Framework for Extraction of U(VI).

Inorganic chemistry·2025
Same author

Acid-sensing ion channel 1a promotes LPS-induced acute lung injury through the circRNA 18-658/miR-127-5p/TRIM72 axis.

Molecular immunology·2025
Same journal

GMSA: A Graph Matching and Point Cloud Registration-Based Method for Spatial Transcriptomics Data Alignment.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same journal

Investigations on Multiple Protein Scaffold Filling.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same journal

Cell Type Prediction for Single-Cell RNA Sequencing Utilizing Unsupervised Domain Adaptation and Semi-Supervised Learning.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same journal

PPIGAN: Prediction of Protein-Protein Interactions Using Generative Adversarial Networks.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same journal

Deep Structure-Enhanced Cell Clustering Model for Single-Cell RNA Sequencing Data.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same journal

Asymmetric Drug-Drug Interaction Prediction Based on Generative Adversarial Networks and Knowledge Graph.

Journal of computational biology : a journal of computational molecular cell biology·2026
See all related articles

Related Experiment Video

Updated: Sep 22, 2025

A Concoction Pipeline for Generating Molecular Operational Taxonomic Units (MOTUs) Among Riparian and Aquatic Beetles
10:23

A Concoction Pipeline for Generating Molecular Operational Taxonomic Units (MOTUs) Among Riparian and Aquatic Beetles

Published on: July 11, 2025

235

kmer2vec: A Novel Method for Comparing DNA Sequences by word2vec Embedding.

Ruohan Ren1, Changchuan Yin2, Stephen S-T Yau1

  • 1Department of Mathematical Sciences, Tsinghua University, Beijing, China.

Journal of Computational Biology : a Journal of Computational Molecular Cell Biology
|May 20, 2022
PubMed
Summary
This summary is machine-generated.

A new kmer2vec method uses natural language processing to embed DNA k-mers into vectors, enabling faster and more accurate genomic comparisons for phylogenetic analysis and large-scale genome studies.

Keywords:
DNA sequenceSARS-CoV-2genomek-merphylogenyword2vec

More Related Videos

Pattern-based Search of Epigenomic Data Using GeNemo
06:38

Pattern-based Search of Epigenomic Data Using GeNemo

Published on: October 8, 2017

5.2K
Multi-locus Variable-number Tandem-repeat Analysis of the Fish-pathogenic Bacterium Yersinia ruckeri by Multiplex PCR and Capillary Electrophoresis
10:33

Multi-locus Variable-number Tandem-repeat Analysis of the Fish-pathogenic Bacterium Yersinia ruckeri by Multiplex PCR and Capillary Electrophoresis

Published on: June 17, 2019

10.9K

Related Experiment Videos

Last Updated: Sep 22, 2025

A Concoction Pipeline for Generating Molecular Operational Taxonomic Units (MOTUs) Among Riparian and Aquatic Beetles
10:23

A Concoction Pipeline for Generating Molecular Operational Taxonomic Units (MOTUs) Among Riparian and Aquatic Beetles

Published on: July 11, 2025

235
Pattern-based Search of Epigenomic Data Using GeNemo
06:38

Pattern-based Search of Epigenomic Data Using GeNemo

Published on: October 8, 2017

5.2K
Multi-locus Variable-number Tandem-repeat Analysis of the Fish-pathogenic Bacterium Yersinia ruckeri by Multiplex PCR and Capillary Electrophoresis
10:33

Multi-locus Variable-number Tandem-repeat Analysis of the Fish-pathogenic Bacterium Yersinia ruckeri by Multiplex PCR and Capillary Electrophoresis

Published on: June 17, 2019

10.9K

Area of Science:

  • Genomics and Bioinformatics
  • Computational Biology
  • Evolutionary Biology

Background:

  • Traditional multiple sequence alignment (MSA) is computationally intensive for large datasets.
  • Existing k-mer alignment-free methods may not fully capture sequence contextual structures.
  • Genomic sequence comparison is crucial for evolutionary and phylogenetic analysis.

Purpose of the Study:

  • To introduce kmer2vec, a novel k-mer contextual alignment-free method for DNA sequence comparison.
  • To leverage natural language processing techniques (word2vec) for semantic embedding of k-mers.
  • To improve the accuracy and speed of phylogenetic analysis, especially for large genomes.

Main Methods:

  • Semantically embed sequence k-mers into word2vec vectors, capturing contextual relationships.
  • Represent DNA/RNA sequences as points in a high-dimensional word2vec space for comparison.
  • Optimize word2vec parameters and validate the method on large coronavirus and bacterial genomes.

Main Results:

  • kmer2vec demonstrates effectiveness in phylogenetic tree construction and species clustering.
  • The method significantly outperforms MSA in speed and conventional k-mer methods in accuracy.
  • Accurate phylogenetic relationships are derived, enabling analysis of large-scale genomic data.

Conclusions:

  • kmer2vec offers a computationally efficient and accurate approach for genomic sequence comparison.
  • The method provides new insights into phylogeny and evolution, facilitating large genome analysis.
  • Potential for rapid SARS-CoV-2 typing by combining kmer2vec with clustering techniques.