Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

One-Way ANOVA: Unequal Sample Sizes

One-Way ANOVA: Unequal Sample Sizes

One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:

One-Way ANOVA

One-Way ANOVA

One-way ANOVA analyzes more than three samples categorized by one factor. For example, it can compare the average mileage of sports bikes. Here, the data is categorized by one factor - the company. However, one-way ANOVA cannot be used to simultaneously compare the sample mean of three or more samples categorized by two factors. An example of two factors would be sports bikes from different companies driven in different terrains, such as a desert or snowy landscape. Here, two-way ANOVA is used...

Friedman Two-way Analysis of Variance by Ranks

Friedman Two-way Analysis of Variance by Ranks

Friedman's Two-Way Analysis of Variance by Ranks is a nonparametric test designed to identify differences across multiple test attempts when traditional assumptions of normality and equal variances do not apply. Unlike conventional ANOVA, which requires normally distributed data with equal variances, Friedman's test is ideal for ordinal or non-normally distributed data, making it particularly useful for analyzing dependent samples, such as matched subjects over time or repeated measures...

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...

Two-Way ANOVA

Two-Way ANOVA

The two-way ANOVA is an extension of the one-way ANOVA. It is a statistical test performed on three or more samples categorized by two factors - a row factor and a column factor. Ronald Fischer mentioned it in 1925 in his book 'Statistical Methods for Researchers.'
The two-way ANOVA analysis initially begins by stating the null hypothesis that there is an interaction effect between the two factors of a dataset. This effect can be visualized using line segments formed by joining the...

Test for Homogeneity

Test for Homogeneity

The goodness–of–fit test can be used to decide whether a population fits a given distribution, but it will not suffice to decide whether two populations follow the same unknown distribution. A different test, called the test for homogeneity, can be used to conclude whether two populations have the same distribution. To calculate the test statistic for a test for homogeneity, follow the same procedure as with the test of independence. The hypotheses for the test for homogeneity can...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Inference in High-Dimensional Online Changepoint Detection.

Journal of the American Statistical Association·2024

Same author

USP: an independence test that improves on Pearson's chi-squared and the <i>G</i>-test.

Proceedings. Mathematical, physical, and engineering sciences·2022

Same author

INSIGHT: A population-scale COVID-19 testing strategy combining point-of-care diagnosis with centralized high-throughput sequencing.

Science advances·2021

Same journal

Simplifying debiased inference via automatic differentiation and probabilistic programming.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026

Same journal

Principal stratification with U-statistics under principal ignorability.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026

Same journal

Causal K-Means Clustering.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026

Same journal

Inference of dependency knowledge graph for Electronic Health Records.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026

Same journal

Correction to: Inference of dependency knowledge graph for Electronic Health Records.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026

Same journal

Harmonized Estimation of Subgroup-Specific Treatment Effects in Randomized Trials: The Use of External Control Data.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 2, 2025

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data

Published on: June 26, 2013

High-dimensional principal component analysis with heterogeneous missingness.

Ziwei Zhu^1,2, Tengyao Wang^1,3, Richard J Samworth¹

¹Statistical Laboratory University of Cambridge Cambridge UK.

Journal of the Royal Statistical Society. Series B, Statistical Methodology

|April 17, 2023

Summary

This summary is machine-generated.

We introduce primePCA, a novel method for high-dimensional Principal Component Analysis (PCA) with missing data. It outperforms existing estimators, especially with heterogeneous missingness, achieving accurate principal component recovery.

Keywords:

heterogeneous missingness high‐dimensional statistics iterative projections missing data principal component analysis

More Related Videos

Basics of Multivariate Analysis in Neuroimaging Data

Basics of Multivariate Analysis in Neuroimaging Data

Published on: July 24, 2010

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

Related Experiment Videos

Last Updated: Aug 2, 2025

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data

Published on: June 26, 2013

Basics of Multivariate Analysis in Neuroimaging Data

Basics of Multivariate Analysis in Neuroimaging Data

Published on: July 24, 2010

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

Area of Science:

Statistics
Machine Learning
Data Science

Background:

High-dimensional Principal Component Analysis (PCA) is crucial for dimensionality reduction.
Missing observations pose a significant challenge in PCA, degrading estimator performance.
Existing methods like observed-proportion weighted (OPW) estimators struggle with heterogeneous missingness.

Purpose of the Study:

To develop a robust method for high-dimensional PCA that effectively handles heterogeneous missing data.
To improve the accuracy and reliability of principal component estimation in the presence of missing observations.
To address the limitations of current estimators, particularly in realistic, non-uniform missing data scenarios.

Main Methods:

Introduced primePCA, an iterative imputation and singular space estimation method.
Leveraged the observed-proportion weighted (OPW) estimator as a starting point.
Utilized projection and singular value decomposition on imputed data matrices.

Main Results:

primePCA demonstrates geometric rate of convergence in noiseless cases with sufficient signal strength.
The method shows improved empirical performance over OPW, especially with heterogeneous missing data.
Theoretical guarantees depend on average missingness properties, not worst-case scenarios.

Conclusions:

primePCA offers a significant advancement for high-dimensional PCA with missing data, particularly in heterogeneous settings.
The method provides accurate principal component recovery where previous approaches failed.
Numerical studies confirm primePCA's effectiveness on both simulated and real-world datasets, even when data are not Missing Completely At Random.