Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

3.4K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
3.4K
Genome-wide Association Studies-GWAS01:11

Genome-wide Association Studies-GWAS

15.2K
Genome-wide association studies or GWAS are used to identify whether common SNPs are associated with certain diseases. Suppose specific SNPs are more frequently observed in individuals with a particular disease than those without the disease. In that case, those SNPs are said to be associated with the disease. Chi-square analysis is performed to check the probability of the allele likely to be associated with the disease.
GWAS does not require the identification of the target gene involved in...
15.2K
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

6.8K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
6.8K
Comparing Copy Number Variations and SNPs02:26

Comparing Copy Number Variations and SNPs

18.5K
Sequencing of the human genome has opened up several best-kept secrets of the genome. Scientists have identified thousands of genome variations that exist within a population. These variations can be a single nucleotide or a larger chromosomal variation.
Copy number variations or CNVs are the structural variations that cover more than 1kb of DNA sequence. The single nucleotide polymorphism (SNP), on the other hand, is a single nucleotide change or a point mutation that is found in more than 1%...
18.5K
Wald-Wolfowitz Runs Test II01:17

Wald-Wolfowitz Runs Test II

487
The Wald-Wolfowitz runs test, commonly referred to as the runs test, is a nonparametric test used to assess the randomness of ordered data. The test evaluates the number of runs, which are consecutive sequences of similar elements within the data. If the number of runs is significantly higher or lower than expected, the data is considered non-random, indicating a detectable pattern or structure.
For binary data, runs are identified using symbols such as + and −, or equivalently, 1s and 0s. In...
487
Detection of Gross Error: The Q Test01:00

Detection of Gross Error: The Q Test

6.8K
When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...
6.8K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Checking the Cox Proportional Hazards Model with Interval-Censored Data.

Journal of the American Statistical Association·2025
Same author

Semiparametric Regression Analysis of Interval-Censored Multi-State Data with An Absorbing State.

Journal of the American Statistical Association·2025
Same author

Multiancestry Genome-Wide Association Study of Early Childhood Caries.

Journal of dental research·2024
Same author

Maximum likelihood estimation for semiparametric regression models with interval-censored multistate data.

Biometrika·2024
Same author

Multi-ancestry Genome-Wide Association Study of Early Childhood Caries.

medRxiv : the preprint server for health sciences·2024
Same author

Marginal proportional hazards models for multivariate interval-censored data.

Biometrika·2023
Same journal

Instrumental Variable Estimation of Marginal Structural Mean Models for Time-Varying Treatment.

Journal of the American Statistical Association·2026
Same journal

Semiparametric Joint Modeling for Survival Analysis with Longitudinal Covariates.

Journal of the American Statistical Association·2026
Same journal

Dimension Reduction for Large-Scale Federated Data: Statistical Rate and Asymptotic Inference.

Journal of the American Statistical Association·2026
Same journal

Facilitating Heterogeneous Effect Estimation via Statistically Efficient Categorical Modifiers.

Journal of the American Statistical Association·2026
Same journal

Nonparametric Density Estimation of a Long-Term Trend from Repeated Semicontinuous Data.

Journal of the American Statistical Association·2026
Same journal

Functional Integrative Bayesian Analysis of High-dimensional Multiplatform Clinicogenomic Data.

Journal of the American Statistical Association·2026
See all related articles

Related Experiment Video

Updated: Dec 31, 2025

Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry
05:53

Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry

Published on: June 21, 2018

10.5K

Robust Score Tests With Missing Data in Genomics Studies.

Kin Yau Wong1, Donglin Zeng2, D Y Lin2

  • 1Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong.

Journal of the American Statistical Association
|January 11, 2020
PubMed
Summary
This summary is machine-generated.

This study introduces a robust statistical test for genomic data with missing values. The method ensures accurate association testing even with imperfect data imputation, improving analysis reliability.

Keywords:
Association testsImputationIntegrative analysisMultiple genomics platformsSemiparametric modelsSieve estimation

More Related Videos

Rare Event Detection Using Error-corrected DNA and RNA Sequencing
10:36

Rare Event Detection Using Error-corrected DNA and RNA Sequencing

Published on: August 3, 2018

12.5K
Pooled CRISPR-Based Genetic Screens in Mammalian Cells
09:05

Pooled CRISPR-Based Genetic Screens in Mammalian Cells

Published on: September 4, 2019

23.0K

Related Experiment Videos

Last Updated: Dec 31, 2025

Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry
05:53

Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry

Published on: June 21, 2018

10.5K
Rare Event Detection Using Error-corrected DNA and RNA Sequencing
10:36

Rare Event Detection Using Error-corrected DNA and RNA Sequencing

Published on: August 3, 2018

12.5K
Pooled CRISPR-Based Genetic Screens in Mammalian Cells
09:05

Pooled CRISPR-Based Genetic Screens in Mammalian Cells

Published on: September 4, 2019

23.0K

Area of Science:

  • Genomics
  • Statistical Genetics
  • Bioinformatics

Background:

  • Missing values in genomic data analysis complicate association studies.
  • Traditional single imputation methods are unreliable if the imputation model is misspecified.

Purpose of the Study:

  • To develop a robust score statistic for testing phenotype-genomic variable associations with missing data.
  • To address limitations of existing imputation methods in genomic analyses.

Main Methods:

  • Proposed a semiparametric regression model for genomic variables with missing values.
  • Imputed missing values using estimated posterior expectations.
  • Developed a spline-based method for model estimation and used sieve/empirical process theory for asymptotic distribution derivation.

Main Results:

  • The proposed score statistic is asymptotically unbiased under general missing-data mechanisms, even with misspecified imputation models.
  • The method is computationally feasible for complex imputation models.
  • Demonstrated advantages over existing methods via simulations and a cancer genomics application.

Conclusions:

  • The developed robust score statistic offers a reliable approach for analyzing genomic data with missing values.
  • This method enhances the accuracy and feasibility of genetic association studies in complex datasets.