Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Residuals and Least-Squares Property

Residuals and Least-Squares Property

The vertical distance between the actual value of y and the estimated value of y. In other words, it measures the vertical distance between the actual data point and the predicted point on the line
If the observed data point lies above the line, the residual is positive, and the line underestimates the actual data value for y. If the observed data point lies below the line, the residual is negative, and the line overestimates the actual data value for y.
The process of fitting the best-fit...

Truncation in Survival Analysis

Truncation in Survival Analysis

Truncation in survival analysis refers to the exclusion of individuals or events from the dataset based on specific criteria related to the time of the event. This exclusion can happen in two primary forms: left truncation and right truncation.
Left truncation occurs when individuals who experienced the event of interest before a certain time are not included in the study. This is often due to a "delayed entry" into the study where only those who survive until a certain entry point are...

Regression Analysis

Regression Analysis

Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Regression Toward the Mean

Regression Toward the Mean

Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...

Multiple Regression

Multiple Regression

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Hallmark-guided subtypes of hepatocellular carcinoma for the identification of immune-related gene classifiers in the prediction of prognosis, treatment efficacy, and drug candidates.

Frontiers in immunology·2022

Same author

Preparing Sr-containing nano-structures on micro-structured titanium alloy surface fabricated by additively manufacturing to enhance the anti-inflammation and osteogenesis.

Colloids and surfaces. B, Biointerfaces·2022

Same author

Intraoperative Optical Coherence Tomography in Idiopathic Macular Epiretinal Membrane Surgery.

International journal of general medicine·2022

Same author

Specific gut microbiota alterations in essential tremor and its difference from Parkinson's disease.

NPJ Parkinson's disease·2022

Same author

Microplastics in personal care products: Exploring public intention of usage by extending the theory of planned behaviour.

The Science of the total environment·2022

Same author

Inhibition of Dyrk1A Attenuates LPS-Induced Neuroinflammation via the TLR4/NF-κB P65 Signaling Pathway.

Inflammation·2022

Same journal

Improving Overall Risk Ranking via Subgroup-Level Information Borrowing in Survival Risk Stratification.

Statistics and its interface·2026

Same journal

High-dimensional Bayesian mediation analysis with adaptive Laplace priors.

Statistics and its interface·2026

Same journal

Imaging mediation analysis for longitudinal outcomes: a case study of childhood brain tumor survivorship.

Statistics and its interface·2025

Same journal

Variable selection for doubly robust causal inference.

Statistics and its interface·2025

Same journal

Smooth online parameter estimation for time varying VAR models with application to rat local field potential activity data.

Statistics and its interface·2025

Same journal

A Double Regression Method for Graphical Modeling of High-dimensional Nonlinear and Non-Gaussian Data.

Statistics and its interface·2025

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 16, 2025

Using Cholesky Decomposition to Explore Individual Differences in Longitudinal Relations between Reading Skills

Using Cholesky Decomposition to Explore Individual Differences in Longitudinal Relations between Reading Skills

Published on: September 17, 2019

The more data, the better? Demystifying deletion-based methods in linear regression with missing data.

Tianchen Xu¹, Kun Chen², Gen Li³

¹Mailman School of Public Health, Columbia University, New York, NY, 10032, USA.

Statistics and Its Interface

|December 21, 2022

Summary

This summary is machine-generated.

Complete-case analysis and available-case analysis are compared for linear regression with missing data. Using more data (available-case analysis) doesn't always improve efficiency, as missing patterns and data structure impact results.

Keywords:

Asymptotic variance Available-case analysis Complete-case analysis Missing data Primary 62D10, 62J05 secondary 62F12

More Related Videos

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

Generation of Genomic Deletions in Mammalian Cell Lines via CRISPR/Cas9

Generation of Genomic Deletions in Mammalian Cell Lines via CRISPR/Cas9

Published on: January 3, 2015

Related Experiment Videos

Last Updated: Aug 16, 2025

Using Cholesky Decomposition to Explore Individual Differences in Longitudinal Relations between Reading Skills

Using Cholesky Decomposition to Explore Individual Differences in Longitudinal Relations between Reading Skills

Published on: September 17, 2019

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

Generation of Genomic Deletions in Mammalian Cell Lines via CRISPR/Cas9

Generation of Genomic Deletions in Mammalian Cell Lines via CRISPR/Cas9

Published on: January 3, 2015

Area of Science:

Statistics
Econometrics
Data Science

Background:

Missing data is a common challenge in statistical analysis.
Deletion methods like complete-case analysis (CC) and available-case analysis (AC) are used to handle missing observations in linear regression.
Understanding the efficiency of these methods is crucial for accurate analysis.

Purpose of the Study:

To compare the asymptotic properties of complete-case analysis (CC) and available-case analysis (AC) for linear regression with missing data.
To investigate the factors influencing the efficiency of these deletion methods.
To clarify potential misinterpretations in existing literature regarding these methods.

Main Methods:

The study theoretically compares complete-case analysis (listwise deletion) and available-case analysis (pairwise deletion).
Asymptotic unbiasedness and variances of estimates from both methods are analyzed under the Missing Completely At Random (MCAR) assumption.
Simulation studies are conducted to validate theoretical findings.

Main Results:

Both complete-case analysis and available-case analysis yield asymptotically unbiased estimates under MCAR.
Available-case analysis does not consistently offer better asymptotic efficiency than complete-case analysis.
The relative efficiency depends on missing data patterns, covariance structure, and true regression coefficients.

Conclusions:

The choice between complete-case and available-case analysis for linear regression with missing data is complex.
Efficiency is not solely determined by the amount of data used; missing data characteristics are critical.
Further research and careful consideration of data properties are needed when applying these deletion methods.