Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Coefficient of Correlation01:12

Coefficient of Correlation

5.9K
The correlation coefficient, r, developed by Karl Pearson in the early 1900s, is numerical and provides a measure of strength and direction of the linear association between the independent variable x and the dependent variable y.
If you suspect a linear relationship between x and y, then r can measure how strong the linear relationship is.
What the VALUE of r tells us:
The value of r is always between –1 and +1: –1 ≤ r ≤ 1.
The size of the correlation r indicates the...
5.9K
Correlation and Regression00:53

Correlation and Regression

1.2K
In statistics, correlation describes the degree of association between two variables. In the subfield of linear regression, correlation is mathematically expressed by the correlation coefficient, which describes the strength and direction of the relationship between two variables. The coefficient is symbolically represented by 'r' and ranges from -1 to +1. A positive value indicates a positive correlation where the two variables move in the same direction. A negative value suggests a...
1.2K
Regression Toward the Mean01:52

Regression Toward the Mean

6.3K
Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...
6.3K
Calculating and Interpreting the Linear Correlation Coefficient01:11

Calculating and Interpreting the Linear Correlation Coefficient

5.9K
The correlation coefficient, r, developed by Karl Pearson in the early 1900s, is numerical and provides a measure of strength and direction of the linear association between the independent variable, x, and the dependent variable, y. Hence, it is also known as the Pearson product-moment correlation coefficient. It can be calculated using the following equation:
5.9K
Residuals and Least-Squares Property01:11

Residuals and Least-Squares Property

7.2K
The vertical distance between the actual value of y and the estimated value of y. In other words, it measures the vertical distance between the actual data point and the predicted point on the line
If the observed data point lies above the line, the residual is positive, and the line underestimates the actual data value for y. If the observed data point lies below the line, the residual is negative, and the line overestimates the actual data value for y.
The process of fitting the best-fit...
7.2K
Multiple Regression01:25

Multiple Regression

2.9K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
2.9K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Improving Medical Visual Representation Learning With Pathological-Level Cross-Modal Alignment and Correlation Exploration.

IEEE journal of biomedical and health informatics·2025
Same author

SGRRG: Leveraging radiology scene graphs for improved and abnormality-aware radiology report generation.

Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society·2025
Same author

A dual-tag strategy with superfolder green fluorescent protein (sfGFP) and small ubiquitin-like modifier (SUMO) for soluble expression of SARS-CoV-2 RNA polymerase visibly in Escherichia coli.

International journal of biological macromolecules·2025
Same author

UTX Responds to Nanotopography to Suppress Macrophage Inflammatory Response by Remodeling H3K27me3 Modification.

Advanced science (Weinheim, Baden-Wurttemberg, Germany)·2025
Same author

Ant Colony-Inspired Adaptive Peptide Nanoregulators Remodeling the Endothelial Barrier to Alleviate Inflammatory Responses.

ACS nano·2025
Same author

Tackling Modality-Heterogeneous Client Drift Holistically for Heterogeneous Multimodal Federated Learning.

IEEE transactions on medical imaging·2025
Same journal

Turbulent flow in a vortex separator with a directed pipe inlet.

Scientific reports·2026
Same journal

Systematic characteristic evaluation of clay-based cementitious material derived from calcium carbide residue and waste tile powder.

Scientific reports·2026
Same journal

Retraction Note: Improvement of a rapid diagnostic application of monoclonal antibodies against avian influenza H7 subtype virus using Europium nanoparticles.

Scientific reports·2026
Same journal

Applying large language models to spam detection in the Kazakh low-resource language setting.

Scientific reports·2026
Same journal

An open-source 3D printing system enabling in-situ freeze-thaw processing of hydrogels.

Scientific reports·2026
Same journal

An enhanced EfficientNet framework for automated waste classification using cosine annealing and label smoothing.

Scientific reports·2026
See all related articles

Related Experiment Video

Updated: May 21, 2025

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data
14:27

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data

Published on: June 26, 2013

15.6K

The iterated score regression estimation algorithm for PCA-based missing data with high correlation.

Guangbao Guo1, Haoyue Song2, Lixing Zhu3,4

  • 1School of Mathematics and Statistics, Shandong University of Technology, Zibo, China. ggb11111111@163.com.

Scientific Reports
|March 18, 2025
PubMed
Summary
This summary is machine-generated.

We introduce iterated score regression, a new imputation algorithm for principal component analysis (PCA)-based missing data with high correlations. This method demonstrates superior accuracy and stability compared to existing techniques.

Keywords:
High correlationIterated score regressionMissing dataPrincipal component analysisSensitivity

More Related Videos

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment
06:48

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

9.1K
O-cresol Concentration Online Measurement Based On Near Infrared Spectroscopy Via Partial Least Square Regression
06:50

O-cresol Concentration Online Measurement Based On Near Infrared Spectroscopy Via Partial Least Square Regression

Published on: November 8, 2019

6.5K

Related Experiment Videos

Last Updated: May 21, 2025

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data
14:27

Identification of Disease-related Spatial Covariance Patterns using Neuroimaging Data

Published on: June 26, 2013

15.6K
Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment
06:48

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

9.1K
O-cresol Concentration Online Measurement Based On Near Infrared Spectroscopy Via Partial Least Square Regression
06:50

O-cresol Concentration Online Measurement Based On Near Infrared Spectroscopy Via Partial Least Square Regression

Published on: November 8, 2019

6.5K

Area of Science:

  • Statistics
  • Data Science
  • Machine Learning

Background:

  • Missing data poses challenges in statistical analyses, particularly within Principal Component Analysis (PCA).
  • High correlations among variables complicate imputation methods for missing data.
  • Existing imputation algorithms may not perform optimally under high correlation scenarios.

Purpose of the Study:

  • To propose a novel imputation algorithm, iterated score regression, for handling missing data in PCA with high correlations.
  • To evaluate the stability and accuracy of the proposed algorithm.
  • To compare the performance of iterated score regression against modified existing algorithms.

Main Methods:

  • Development of the iterated score regression algorithm using a transformation matrix to separate missing and observed data.
  • Construction of regression equations based on data blocks, score matrix, and PCA model.
  • Sensitivity analysis examining effects of standard deviations, correlation coefficients, missing proportions, variable numbers, and sample sizes.
  • Modification and comparison with three existing imputation algorithms.

Main Results:

  • The iterated score regression algorithm consistently achieved the smallest Mean Squared Error (MSE) values among compared methods.
  • The algorithm demonstrated stability and accuracy across various tested conditions.
  • Numerical studies and real-world data set illustrations confirmed the algorithm's advantages.

Conclusions:

  • Iterated score regression is an effective imputation method for PCA with highly correlated missing data.
  • The algorithm offers improved accuracy and stability over existing approaches.
  • The proposed method provides a valuable tool for addressing complex missing data scenarios in statistical modeling.