Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Multi-species Conserved Sequences

Multi-species Conserved Sequences

Next-generation sequencing technologies have created large genomic databases of a variety of animals and plants. Ever since the human genome project was completed, scientists studied the genome of primates, mammals, and other phylogenetically distant living beings. Such large-scale studies have provided new insights into the evolutionary relationship between organisms.
Although the genome of each species varies greatly from each other, a few sequences are highly conserved. Such conserved...

Evolutionary Relationships through Genome Comparisons

Evolutionary Relationships through Genome Comparisons

Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...

Comparing Copy Number Variations and SNPs

Comparing Copy Number Variations and SNPs

Sequencing of the human genome has opened up several best-kept secrets of the genome. Scientists have identified thousands of genome variations that exist within a population. These variations can be a single nucleotide or a larger chromosomal variation.
Copy number variations or CNVs are the structural variations that cover more than 1kb of DNA sequence. The single nucleotide polymorphism (SNP), on the other hand, is a single nucleotide change or a point mutation that is found in more than 1%...

Wilcoxon Signed-Ranks Test for Matched Pairs

Wilcoxon Signed-Ranks Test for Matched Pairs

The Wilcoxon signed-rank test for matched pairs evaluates the null hypothesis by combining the ranks of differences with their signs. It essentially tests whether the median of the differences in a population of matched pairs is zero. Since the test incorporates more information than the sign test, it generally yields more trustable conclusions. This test also does not require the data to follow a normal distribution, but two conditions must be met for it to be applicable: (1) the data must...

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

Determination of Expected Frequency

Determination of Expected Frequency

Suppose one wants to test independence between the two variables of a contingency table. The values in the table constitute the observed frequencies of the dataset. But how does one determine the expected frequency of the dataset? One of the important assumptions is that the two variables are independent, which means the variables do not influence each other. For independent variables, the statistical probability of any event involving both variables is calculated by multiplying the individual...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

A map of the rubisco biochemical landscape.

Nature·2025

Same author

A map of the rubisco biochemical landscape.

bioRxiv : the preprint server for biology·2024

Same author

A deep-learning workflow to predict upper tract urothelial carcinoma protein-based subtypes from H&E slides supporting the prioritization of patients for molecular testing.

The journal of pathology. Clinical research·2024

Same author

Exploring the Protein Sequence Space with Global Generative Models.

Cold Spring Harbor perspectives in biology·2023

Same author

[Chest Tube in Thoracic Trauma - Recommendations of the Interdisciplinary Thoracic Trauma Task Group of the German Society for Thoracic Surgery (DGT) and the German Trauma Society (DGU)].

Zentralblatt fur Chirurgie·2023

Same author

App-SpaM: phylogenetic placement of short reads without sequence alignment.

Bioinformatics advances·2023

Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026

Same journal

SpaMFG: a Spatial Multi-omics Integration Method based on Feature Grouping.

Bioinformatics (Oxford, England)·2026

Same journal

CSCN: Inference of Cell-Specific Causal Networks Using Single-Cell RNA-Seq Data.

Bioinformatics (Oxford, England)·2026

Same journal

Sparse CCA-Based Mediation Analysis with High-Dimensional Exposures and Mediators.

Bioinformatics (Oxford, England)·2026

Same journal

Enhancing Cross-Context Generalization in Drug Perturbation Prediction with a Multimodal Conditional Diffusion Framework.

Bioinformatics (Oxford, England)·2026

Same journal

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 1, 2026

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

Fast alignment-free sequence comparison using spaced-word frequencies.

Chris-Andre Leimeister¹, Marcus Boden¹, Sebastian Horwege¹

¹Department of Bioinformatics, University of Göttingen, Institute of Microbiology and Genetics, 37073 Göttingen, Germany and Université d'Évry Val d'Essonne, Laboratoire Statistique et Génome, UMR CNRS 8071, USC INRA, 91037 Évry, France.

Bioinformatics (Oxford, England)

|April 5, 2014

Summary

This summary is machine-generated.

Alignment-free sequence comparison using spaced words improves phylogenetic accuracy. This method enhances speed and reduces statistical dependency for better genome analysis.

More Related Videos

A Practical Guide to Phylogenetics for Nonexperts

A Practical Guide to Phylogenetics for Nonexperts

Published on: February 5, 2014

Author Spotlight: Alignment of Synchronized Time-Series Data Using the Characterizing Loss of Cell Cycle Synchrony Model for Cross-Experiment Comparisons

Author Spotlight: Alignment of Synchronized Time-Series Data Using the Characterizing Loss of Cell Cycle Synchrony Model for Cross-Experiment Comparisons

Published on: June 9, 2023

Related Experiment Videos

Last Updated: May 1, 2026

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

A Practical Guide to Phylogenetics for Nonexperts

A Practical Guide to Phylogenetics for Nonexperts

Published on: February 5, 2014

Author Spotlight: Alignment of Synchronized Time-Series Data Using the Characterizing Loss of Cell Cycle Synchrony Model for Cross-Experiment Comparisons

Author Spotlight: Alignment of Synchronized Time-Series Data Using the Characterizing Loss of Cell Cycle Synchrony Model for Cross-Experiment Comparisons

Published on: June 9, 2023

Area of Science:

Bioinformatics
Computational Biology
Genomics

Background:

Alignment-free methods offer faster sequence comparison than alignment-based approaches for genome analysis and phylogeny.
However, existing alignment-free methods often sacrifice accuracy due to statistical dependencies between adjacent word matches.

Purpose of the Study:

To introduce a novel alignment-free sequence comparison method using 'spaced words' to mitigate statistical dependencies.
To develop a fast and accurate computational approach for sequence analysis and phylogenetic reconstruction.

Main Methods:

Utilized 'spaced words,' defined by patterns of 'match' and 'don't care' positions, for sequence comparison.
Implemented a fast algorithm using recursive hashing and bit operations.
Employed multiple patterns to further enhance accuracy and reduce statistical dependency.

Main Results:

Demonstrated that the multiple-pattern spaced-word approach significantly reduces statistical dependency between word matches.
Achieved improved accuracy in phylogenetic reconstruction compared to methods using contiguous words.
Validated the approach using both real-world and simulated sequence data.

Conclusions:

The proposed spaced-word method provides a more accurate and efficient alignment-free approach for sequence comparison.
This method offers a valuable tool for genome analysis and phylogeny reconstruction, overcoming limitations of traditional methods.