Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance, comparing...

Correlation of Experimental Data

Correlation of Experimental Data

Dimensional analysis simplifies complex physical problems and guides experimental investigations, but it does not provide complete solutions. It identifies the dimensionless groups that influence a phenomenon, but experimental data is needed to establish the specific relationships and validate theoretical predictions.
For example, a spherical particle moving through a viscous fluid experiences drag. Dimensional analysis shows that the drag force depends on the particle's diameter, velocity, and...

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...

Dimensional Analysis

Dimensional Analysis

Dimensional analysis is a powerful tool that is used in physics and engineering to understand and predict the behavior of physical systems. The basic idea behind dimensional analysis is to express physical quantities in terms of fundamental dimensions such as the mass, length, and time. Derived dimensions like the velocity, acceleration, and force are derived from the combinations of these fundamental dimensions.
Dimensional analysis allows us to analyze and compare physical quantities on a...

Dimensional Analysis

Dimensional Analysis

Dimensional analysis is a valuable technique in fluid mechanics for simplifying complex problems by reducing them into dimensionless groups. These groups capture the essential relationships between the variables involved, allowing researchers and engineers to analyze fluid flow without dealing with each variable individually. This approach reduces the number of independent variables, allowing for easier analysis and better understanding of physical phenomena.
In fluid mechanics, dimensional...

Dimensional Analysis

Dimensional Analysis

Dimensional analysis, also known as the factor label method, is a versatile approach for mathematical operations. The main principle behind this approach is: the units of quantities must be subjected to the same mathematical operations as their associated numbers. This method can be applied to computations ranging from simple unit conversions to more complex and multi-step calculations involving several different quantities and their units.
Conversion Factors and Dimensional Analysis
The unit...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

STELLAR: A flexible ensemble learning framework integrating rare variants to enhance polygenic risk prediction.

medRxiv : the preprint server for health sciences·2026

Same author

Statistics and AI - A Fireside Conversation.

Harvard data science review·2026

Same author

TESTING FOR THE CAUSAL MEDIATION EFFECTS OF MULTIPLE MEDIATORS USING THE KERNEL MACHINE DIFFERENCE METHOD IN GENOME-WIDE EPIGENETIC STUDIES.

The annals of applied statistics·2026

Same author

Highly Stable Quasi-Solid Thermocells for Continuous Power Generation Across a Broad Humidity Range.

Small (Weinheim an der Bergstrasse, Germany)·2026

Same author

Interfacial engineering with functionalized lignin nanoparticles enables stable, conductive aqueous carbon nanotube inks for flexible sensors.

International journal of biological macromolecules·2026

Same author

Scalable Gaussian process regression via median posterior inference for estimating the health effects of an environmental mixture.

Biometrics·2026

Same journal

Semiparametric Joint Modeling for Survival Analysis with Longitudinal Covariates.

Journal of the American Statistical Association·2026

Same journal

Facilitating Heterogeneous Effect Estimation via Statistically Efficient Categorical Modifiers.

Journal of the American Statistical Association·2026

Same journal

Nonparametric Density Estimation of a Long-Term Trend from Repeated Semicontinuous Data.

Journal of the American Statistical Association·2026

Same journal

Functional Integrative Bayesian Analysis of High-dimensional Multiplatform Clinicogenomic Data.

Journal of the American Statistical Association·2026

Same journal

SMART-MC: Characterizing the Dynamics of Multiple Sclerosis Therapy Transitions Using a Covariate-Based Markov Model.

Journal of the American Statistical Association·2026

Same journal

Bayesian Image Mediation Analysis.

Journal of the American Statistical Association·2026

See all related articles

Search research articles

Related Experiment Videos

Dimension Reduction for Large-Scale Federated Data: Statistical Rate and Asymptotic Inference.

Shuting Shen¹, Junwei Lu², Xihong Lin³

¹National University of Singapore.

Journal of the American Statistical Association

|June 17, 2026

Summary

This summary is machine-generated.

We introduce FAst DIstributed (FADI) PCA, a novel method for analyzing large-scale federated data. FADI efficiently handles high dimensionality and massive sample sizes, overcoming limitations of traditional Principal Component Analysis (PCA).

Keywords:

Computational efficiency Distributed computing Fast PCA Large-scale inference Random sketches

Related Experiment Videos

Area of Science:

Statistics
Machine Learning
Computational Biology

Background:

Traditional Principal Component Analysis (PCA) faces challenges with large-scale federated data due to privacy and computational costs.
Existing distributed algorithms often struggle with both high dimensionality and massive sample sizes simultaneously.

Purpose of the Study:

To propose a novel distributed PCA method (FADI) for ultra-large dimensional and sample-sized federated data.
To develop a general framework for statistical problems in distributed settings.
To analyze the computational efficiency and error rates of the proposed method.

Main Methods:

Developed FAst DIstributed (FADI) PCA by combining parallel computing along dimensions and distributed computing along samples.
Utilized L parallel copies of p-dimensional fast sketches to divide computational burden.
Established a general theoretical framework with comprehensive results for statistical problems.

Main Results:

FADI accelerates computation while maintaining the same non-asymptotic error rate as traditional PCA when L*p >= d.
Derived inferential results characterizing the asymptotic distribution of FADI.
Observed a phase-transition phenomenon as L*p increases.

Conclusions:

FADI offers an efficient solution for PCA on ultra-large federated datasets.
The method provides theoretical guarantees on error rates and asymptotic distributions.
FADI was successfully applied to analyze population structure in the 1000 Genomes data.