Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Residuals and Least-Squares Property01:11

Residuals and Least-Squares Property

7.8K
The vertical distance between the actual value of y and the estimated value of y. In other words, it measures the vertical distance between the actual data point and the predicted point on the line
If the observed data point lies above the line, the residual is positive, and the line underestimates the actual data value for y. If the observed data point lies below the line, the residual is negative, and the line overestimates the actual data value for y.
The process of fitting the best-fit...
7.8K
Truncation in Survival Analysis01:09

Truncation in Survival Analysis

272
Truncation in survival analysis refers to the exclusion of individuals or events from the dataset based on specific criteria related to the time of the event. This exclusion can happen in two primary forms: left truncation and right truncation.
Left truncation occurs when individuals who experienced the event of interest before a certain time are not included in the study. This is often due to a "delayed entry" into the study where only those who survive until a certain entry point are...
272
Regression Analysis01:11

Regression Analysis

6.0K
Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:
6.0K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

1.8K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
1.8K
Regression Toward the Mean01:52

Regression Toward the Mean

6.3K
Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...
6.3K
Multiple Regression01:25

Multiple Regression

3.1K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
3.1K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Hallmark-guided subtypes of hepatocellular carcinoma for the identification of immune-related gene classifiers in the prediction of prognosis, treatment efficacy, and drug candidates.

Frontiers in immunology·2022
Same author

Preparing Sr-containing nano-structures on micro-structured titanium alloy surface fabricated by additively manufacturing to enhance the anti-inflammation and osteogenesis.

Colloids and surfaces. B, Biointerfaces·2022
Same author

Intraoperative Optical Coherence Tomography in Idiopathic Macular Epiretinal Membrane Surgery.

International journal of general medicine·2022
Same author

Specific gut microbiota alterations in essential tremor and its difference from Parkinson's disease.

NPJ Parkinson's disease·2022
Same author

Microplastics in personal care products: Exploring public intention of usage by extending the theory of planned behaviour.

The Science of the total environment·2022
Same author

Inhibition of Dyrk1A Attenuates LPS-Induced Neuroinflammation via the TLR4/NF-κB P65 Signaling Pathway.

Inflammation·2022
Same journal

Improving Overall Risk Ranking via Subgroup-Level Information Borrowing in Survival Risk Stratification.

Statistics and its interface·2026
Same journal

High-dimensional Bayesian mediation analysis with adaptive Laplace priors.

Statistics and its interface·2026
Same journal

Imaging mediation analysis for longitudinal outcomes: a case study of childhood brain tumor survivorship.

Statistics and its interface·2025
Same journal

Variable selection for doubly robust causal inference.

Statistics and its interface·2025
Same journal

Smooth online parameter estimation for time varying VAR models with application to rat local field potential activity data.

Statistics and its interface·2025
Same journal

A Double Regression Method for Graphical Modeling of High-dimensional Nonlinear and Non-Gaussian Data.

Statistics and its interface·2025
See all related articles

Related Experiment Video

Updated: Aug 16, 2025

Using Cholesky Decomposition to Explore Individual Differences in Longitudinal Relations between Reading Skills
06:52

Using Cholesky Decomposition to Explore Individual Differences in Longitudinal Relations between Reading Skills

Published on: September 17, 2019

6.4K

The more data, the better? Demystifying deletion-based methods in linear regression with missing data.

Tianchen Xu1, Kun Chen2, Gen Li3

  • 1Mailman School of Public Health, Columbia University, New York, NY, 10032, USA.

Statistics and Its Interface
|December 21, 2022
PubMed
Summary
This summary is machine-generated.

Complete-case analysis and available-case analysis are compared for linear regression with missing data. Using more data (available-case analysis) doesn't always improve efficiency, as missing patterns and data structure impact results.

Keywords:
Asymptotic varianceAvailable-case analysisComplete-case analysisMissing dataPrimary 62D10, 62J05secondary 62F12

More Related Videos

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

14.6K
Generation of Genomic Deletions in Mammalian Cell Lines via CRISPR/Cas9
09:40

Generation of Genomic Deletions in Mammalian Cell Lines via CRISPR/Cas9

Published on: January 3, 2015

95.8K

Related Experiment Videos

Last Updated: Aug 16, 2025

Using Cholesky Decomposition to Explore Individual Differences in Longitudinal Relations between Reading Skills
06:52

Using Cholesky Decomposition to Explore Individual Differences in Longitudinal Relations between Reading Skills

Published on: September 17, 2019

6.4K
Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

14.6K
Generation of Genomic Deletions in Mammalian Cell Lines via CRISPR/Cas9
09:40

Generation of Genomic Deletions in Mammalian Cell Lines via CRISPR/Cas9

Published on: January 3, 2015

95.8K

Area of Science:

  • Statistics
  • Econometrics
  • Data Science

Background:

  • Missing data is a common challenge in statistical analysis.
  • Deletion methods like complete-case analysis (CC) and available-case analysis (AC) are used to handle missing observations in linear regression.
  • Understanding the efficiency of these methods is crucial for accurate analysis.

Purpose of the Study:

  • To compare the asymptotic properties of complete-case analysis (CC) and available-case analysis (AC) for linear regression with missing data.
  • To investigate the factors influencing the efficiency of these deletion methods.
  • To clarify potential misinterpretations in existing literature regarding these methods.

Main Methods:

  • The study theoretically compares complete-case analysis (listwise deletion) and available-case analysis (pairwise deletion).
  • Asymptotic unbiasedness and variances of estimates from both methods are analyzed under the Missing Completely At Random (MCAR) assumption.
  • Simulation studies are conducted to validate theoretical findings.

Main Results:

  • Both complete-case analysis and available-case analysis yield asymptotically unbiased estimates under MCAR.
  • Available-case analysis does not consistently offer better asymptotic efficiency than complete-case analysis.
  • The relative efficiency depends on missing data patterns, covariance structure, and true regression coefficients.

Conclusions:

  • The choice between complete-case and available-case analysis for linear regression with missing data is complex.
  • Efficiency is not solely determined by the amount of data used; missing data characteristics are critical.
  • Further research and careful consideration of data properties are needed when applying these deletion methods.