Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Wald-Wolfowitz Runs Test I

Wald-Wolfowitz Runs Test I

The Wald-Wolfowitz test, also known as the runs test, is a nonparametric statistical test used to assess the randomness of a sequence of two different types of elements (e.g., positive/negative values, successes/failures). It examines whether the order of the elements in a sequence is random or if there is a pattern or trend present. This nonparametric test applies to any ordered data despite the population and sample data distribution, even if a higher sample size is available.
The test works...

Wald-Wolfowitz Runs Test II

Wald-Wolfowitz Runs Test II

The Wald-Wolfowitz runs test, commonly referred to as the runs test, is a nonparametric test used to assess the randomness of ordered data. The test evaluates the number of runs, which are consecutive sequences of similar elements within the data. If the number of runs is significantly higher or lower than expected, the data is considered non-random, indicating a detectable pattern or structure.
For binary data, runs are identified using symbols such as + and −, or equivalently, 1s and 0s. In...

Test for Homogeneity

Test for Homogeneity

The goodness–of–fit test can be used to decide whether a population fits a given distribution, but it will not suffice to decide whether two populations follow the same unknown distribution. A different test, called the test for homogeneity, can be used to conclude whether two populations have the same distribution. To calculate the test statistic for a test for homogeneity, follow the same procedure as with the test of independence. The hypotheses for the test for homogeneity can be stated as...

Sign Test for Matched Pairs

Sign Test for Matched Pairs

The sign test for matched pairs offers a robust method for comparing two paired samples, often for the effects of an intervention in one of them. This method is very useful in situations where the underlying distribution of the data is unknown. The test compares two related samples—often pre- and post-treatment measurements on the same subjects—to determine if there are significant differences in their median values.
To conduct the sign test, we first calculate the differences in value between...

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

Censoring Survival Data

Censoring Survival Data

Survival analysis is a statistical method used to analyze time-to-event data, often employed in fields such as medicine, engineering, and social sciences. One of the key challenges in survival analysis is dealing with incomplete data, a phenomenon known as "censoring." Censoring occurs when the event of interest (such as death, relapse, or system failure) has not occurred for some individuals by the end of the study period or is otherwise unobservable, and it might have many different reasons...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Unified framework for the ingestion of early epidemic data for downstream data analytics.

Wellcome open research·2026

Same author

Biased estimates of phylogenetic branch lengths resulting from the discretised Gamma model of site rate heterogeneity.

Systematic biology·2026

Same author

Global approaches to infectious disease surveillance and modeling.

Nature medicine·2026

Same author

HIV Transmission in a Declining African Epidemic.

medRxiv : the preprint server for health sciences·2026

Same author

Overdominance for fitness: a genomic comparison between empirical and simulated data with Drosophila melanogaster.

Genetics·2026

Same author

Author Correction: The Solve-RD Solvathons as a pan-European interdisciplinary collaboration to diagnose patients with rare disease.

Nature genetics·2026

Same journal

Adaptive Dynamics of Quantitative Traits in a Steadily Changing Environment.

Genetics·2026

Same journal

Functional Landscape of Zebrafish Gonadotropins and Receptors: A Comprehensive Genetic Analysis.

Genetics·2026

Same journal

Synergistic actions of Nup43 and Myosin VI drive actin cone assembly during Drosophila spermiogenesis.

Genetics·2026

Same journal

Identification of two Cryptococcus neoformans heme transporters involved in Fhb1-mediated nitrosative stress protection in a fission yeast model.

Genetics·2026

Same journal

Analysis of a hypomorphic mei-P26 mutation reveals coordination between developmental programming of germ cells and meiotic chromosome dynamics.

Genetics·2026

Same journal

Neural and Genetic Mechanisms Regulating Copulation Latency in Male Drosophila melanogaster.

Genetics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 21, 2026

Determining the Likelihood of Variant Pathogenicity Using Amino Acid-level Signal-to-Noise Analysis of Genetic Variation

Determining the Likelihood of Variant Pathogenicity Using Amino Acid-level Signal-to-Noise Analysis of Genetic Variation

Published on: January 16, 2019

Neutrality tests for sequences with missing data.

Luca Ferretti¹, Emanuele Raineri, Sebastian Ramos-Onsins

¹Centre for Research in Agricultural Genomics, 08193 Bellaterra, Spain. luca.ferretti@uab.cat

|June 5, 2012

Summary

This summary is machine-generated.

This study introduces new methods to analyze DNA sequences with missing data, improving variability and neutrality tests without removing data. These modified statistical tools enhance DNA sequence analysis in genomics.

More Related Videos

Rare Event Detection Using Error-corrected DNA and RNA Sequencing

Rare Event Detection Using Error-corrected DNA and RNA Sequencing

Published on: August 3, 2018

Introductory Analysis and Validation of CUT&RUN Sequencing Data

Introductory Analysis and Validation of CUT&RUN Sequencing Data

Published on: December 13, 2024

Related Experiment Videos

Last Updated: May 21, 2026

Determining the Likelihood of Variant Pathogenicity Using Amino Acid-level Signal-to-Noise Analysis of Genetic Variation

Determining the Likelihood of Variant Pathogenicity Using Amino Acid-level Signal-to-Noise Analysis of Genetic Variation

Published on: January 16, 2019

Rare Event Detection Using Error-corrected DNA and RNA Sequencing

Rare Event Detection Using Error-corrected DNA and RNA Sequencing

Published on: August 3, 2018

Introductory Analysis and Validation of CUT&RUN Sequencing Data

Introductory Analysis and Validation of CUT&RUN Sequencing Data

Published on: December 13, 2024

Area of Science:

Genomics
Bioinformatics
Population Genetics

Background:

High-throughput DNA sequencing frequently yields data with missing bases or individuals.
Low-quality samples and experimental issues exacerbate data loss in sequencing.
Existing methods often require removing incomplete data, potentially biasing results.

Purpose of the Study:

To develop modified variability estimators and neutrality tests for DNA sequences containing missing data.
To provide a general framework for incorporating missing data into frequency spectrum-based neutrality tests.
To derive exact variance expressions for these statistics under the neutral model.

Main Methods:

Modified Watterson estimator (θW), Tajima's D, Fay and Wu's H, and HKA statistics.
Development of a general framework for frequency spectrum-based neutrality tests with missing data.
Derivation of exact variance expressions under the neutral model.

Main Results:

Modified estimators and neutrality tests can be applied directly to sequences with missing data.
The proposed methods avoid the need to remove bases or individuals from analysis.
The framework allows for the use of neutrality tests as summary statistics for other data types, such as DNA microarrays.

Conclusions:

The developed methods offer a robust approach to analyzing DNA sequence data with missing values.
These advancements improve the accuracy and efficiency of population genetics analyses.
The framework has broader applications in analyzing diverse biological datasets.