Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data01:16

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance, comparing...
Correlation of Experimental Data01:23

Correlation of Experimental Data

Dimensional analysis simplifies complex physical problems and guides experimental investigations, but it does not provide complete solutions. It identifies the dimensionless groups that influence a phenomenon, but experimental data is needed to establish the specific relationships and validate theoretical predictions.
For example, a spherical particle moving through a viscous fluid experiences drag. Dimensional analysis shows that the drag force depends on the particle's diameter, velocity, and...
One-Way ANOVA: Equal Sample Sizes01:15

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...
Dimensional Analysis01:23

Dimensional Analysis

Dimensional analysis is a powerful tool that is used in physics and engineering to understand and predict the behavior of physical systems. The basic idea behind dimensional analysis is to express physical quantities in terms of fundamental dimensions such as the mass, length, and time. Derived dimensions like the velocity, acceleration, and force are derived from the combinations of these fundamental dimensions.
Dimensional analysis allows us to analyze and compare physical quantities on a...
Dimensional Analysis01:27

Dimensional Analysis

Dimensional analysis is a valuable technique in fluid mechanics for simplifying complex problems by reducing them into dimensionless groups. These groups capture the essential relationships between the variables involved, allowing researchers and engineers to analyze fluid flow without dealing with each variable individually. This approach reduces the number of independent variables, allowing for easier analysis and better understanding of physical phenomena.
In fluid mechanics, dimensional...
Dimensional Analysis03:40

Dimensional Analysis

Dimensional analysis, also known as the factor label method, is a versatile approach for mathematical operations. The main principle behind this approach is: the units of quantities must be subjected to the same mathematical operations as their associated numbers. This method can be applied to computations ranging from simple unit conversions to more complex and multi-step calculations involving several different quantities and their units.
Conversion Factors and Dimensional Analysis
The unit...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

STELLAR: A flexible ensemble learning framework integrating rare variants to enhance polygenic risk prediction.

medRxiv : the preprint server for health sciences·2026
Same author

Statistics and AI - A Fireside Conversation.

Harvard data science review·2026
Same author

TESTING FOR THE CAUSAL MEDIATION EFFECTS OF MULTIPLE MEDIATORS USING THE KERNEL MACHINE DIFFERENCE METHOD IN GENOME-WIDE EPIGENETIC STUDIES.

The annals of applied statistics·2026
Same author

Highly Stable Quasi-Solid Thermocells for Continuous Power Generation Across a Broad Humidity Range.

Small (Weinheim an der Bergstrasse, Germany)·2026
Same author

Interfacial engineering with functionalized lignin nanoparticles enables stable, conductive aqueous carbon nanotube inks for flexible sensors.

International journal of biological macromolecules·2026
Same author

Scalable Gaussian process regression via median posterior inference for estimating the health effects of an environmental mixture.

Biometrics·2026
Same journal

Semiparametric Joint Modeling for Survival Analysis with Longitudinal Covariates.

Journal of the American Statistical Association·2026
Same journal

Facilitating Heterogeneous Effect Estimation via Statistically Efficient Categorical Modifiers.

Journal of the American Statistical Association·2026
Same journal

Nonparametric Density Estimation of a Long-Term Trend from Repeated Semicontinuous Data.

Journal of the American Statistical Association·2026
Same journal

Functional Integrative Bayesian Analysis of High-dimensional Multiplatform Clinicogenomic Data.

Journal of the American Statistical Association·2026
Same journal

SMART-MC: Characterizing the Dynamics of Multiple Sclerosis Therapy Transitions Using a Covariate-Based Markov Model.

Journal of the American Statistical Association·2026
Same journal

Bayesian Image Mediation Analysis.

Journal of the American Statistical Association·2026
See all related articles

Related Experiment Videos

Dimension Reduction for Large-Scale Federated Data: Statistical Rate and Asymptotic Inference.

Shuting Shen1, Junwei Lu2, Xihong Lin3

  • 1National University of Singapore.

Journal of the American Statistical Association
|June 17, 2026
PubMed
Summary
This summary is machine-generated.

We introduce FAst DIstributed (FADI) PCA, a novel method for analyzing large-scale federated data. FADI efficiently handles high dimensionality and massive sample sizes, overcoming limitations of traditional Principal Component Analysis (PCA).

Keywords:
Computational efficiencyDistributed computingFast PCALarge-scale inferenceRandom sketches

Related Experiment Videos

Area of Science:

  • Statistics
  • Machine Learning
  • Computational Biology

Background:

  • Traditional Principal Component Analysis (PCA) faces challenges with large-scale federated data due to privacy and computational costs.
  • Existing distributed algorithms often struggle with both high dimensionality and massive sample sizes simultaneously.

Purpose of the Study:

  • To propose a novel distributed PCA method (FADI) for ultra-large dimensional and sample-sized federated data.
  • To develop a general framework for statistical problems in distributed settings.
  • To analyze the computational efficiency and error rates of the proposed method.

Main Methods:

  • Developed FAst DIstributed (FADI) PCA by combining parallel computing along dimensions and distributed computing along samples.
  • Utilized L parallel copies of p-dimensional fast sketches to divide computational burden.
  • Established a general theoretical framework with comprehensive results for statistical problems.

Main Results:

  • FADI accelerates computation while maintaining the same non-asymptotic error rate as traditional PCA when L*p >= d.
  • Derived inferential results characterizing the asymptotic distribution of FADI.
  • Observed a phase-transition phenomenon as L*p increases.

Conclusions:

  • FADI offers an efficient solution for PCA on ultra-large federated datasets.
  • The method provides theoretical guarantees on error rates and asymptotic distributions.
  • FADI was successfully applied to analyze population structure in the 1000 Genomes data.