Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

One-Way ANOVA: Unequal Sample Sizes

One-Way ANOVA: Unequal Sample Sizes

One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...

Regression Analysis

Regression Analysis

Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:

Friedman Two-way Analysis of Variance by Ranks

Friedman Two-way Analysis of Variance by Ranks

Friedman's Two-Way Analysis of Variance by Ranks is a nonparametric test designed to identify differences across multiple test attempts when traditional assumptions of normality and equal variances do not apply. Unlike conventional ANOVA, which requires normally distributed data with equal variances, Friedman's test is ideal for ordinal or non-normally distributed data, making it particularly useful for analyzing dependent samples, such as matched subjects over time or repeated measures...

Two-Way ANOVA

Two-Way ANOVA

The two-way ANOVA is an extension of the one-way ANOVA. It is a statistical test performed on three or more samples categorized by two factors - a row factor and a column factor. Ronald Fischer mentioned it in 1925 in his book 'Statistical Methods for Researchers.'
The two-way ANOVA analysis initially begins by stating the null hypothesis that there is an interaction effect between the two factors of a dataset. This effect can be visualized using line segments formed by joining the...

Censoring Survival Data

Censoring Survival Data

Survival analysis is a statistical method used to analyze time-to-event data, often employed in fields such as medicine, engineering, and social sciences. One of the key challenges in survival analysis is dealing with incomplete data, a phenomenon known as "censoring." Censoring occurs when the event of interest (such as death, relapse, or system failure) has not occurred for some individuals by the end of the study period or is otherwise unobservable, and it might have many different...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Outcomes of Adolescents and Young Adults with AML Treated on Pediatric vs Adult Protocols.

Blood advances·2026

Same author

Bleeding in patients with lung adenocarcinoma receiving concurrent administration of anticoagulation and VEGF or EGFR inhibitors.

Journal of thrombosis and haemostasis : JTH·2026

Same author

Low Dose Tocilizumab for Mitigation of Cytokine Release Syndrome With T-Cell Engaging Bispecific Antibodies.

Clinical lymphoma, myeloma & leukemia·2026

Same author

Diagnosis Disclosure and Related Illness Experience in Patients With Multiple Myeloma and Precursor Plasma Cell Disorders.

Clinical lymphoma, myeloma & leukemia·2026

Same author

Functionally high-risk disease is associated with poor outcomes after late-line CAR T-cell therapy for multiple myeloma.

Blood cancer journal·2026

Same author

Dietary patterns among individuals with plasma cell disorders- opportunities for targeted interventions.

Blood cancer journal·2026

Same journal

A SEQUENTIAL SIGNIFICANCE TEST FOR TREATMENT BY COVARIATE INTERACTIONS.

Statistica Sinica·2026

Same journal

DEFINING AND ESTIMATING PRINCIPAL STRATUM SPECIFIC NATURAL MEDIATION EFFECTS WITH SEMI-COMPETING RISKS DATA.

Statistica Sinica·2026

Same journal

Longitudinal Modeling of Rank-based Global Outcome.

Statistica Sinica·2026

Same journal

COMMUNITY EXTRACTION OF NETWORK DATA UNDER STOCHASTIC BLOCK MODELS.

Statistica Sinica·2026

Same journal

STATISTICAL INFERENCE FOR MEAN FUNCTIONS OF COMPLEX 3D OBJECTS.

Statistica Sinica·2025

Same journal

High-dimensional Subgroup Regression Analysis.

Statistica Sinica·2025

See all related articles

Search research articles

Related Experiment Video

Updated: Apr 7, 2026

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Published on: September 11, 2021

INTEGRATING INCOMPLETE DATA FOR MEDIATION ANALYSIS.

Andriy Derkach¹, Joshua N Sampson², Ruth M Pfeiffer²

¹Department of Epidemiology and Biostatistics, MSKCC, New York, NY 10017, USA.

Statistica Sinica

|April 6, 2026

Summary

This summary is machine-generated.

This study introduces novel semiparametric methods for mediation analysis, enabling causal parameter estimation from incomplete datasets. These techniques efficiently combine information from multiple sources, even when only summary statistics are available.

Keywords:

Data integration direct and indirect effects semiparametric likelihood summary level information

More Related Videos

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Published on: September 27, 2019

Using the Race Model Inequality to Quantify Behavioral Multisensory Integration Effects

Using the Race Model Inequality to Quantify Behavioral Multisensory Integration Effects

Published on: May 10, 2019

Related Experiment Videos

Last Updated: Apr 7, 2026

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Problem-Solving Before Instruction PS-I: A Protocol for Assessment and Intervention in Students with Different Abilities

Published on: September 11, 2021

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Published on: September 27, 2019

Using the Race Model Inequality to Quantify Behavioral Multisensory Integration Effects

Using the Race Model Inequality to Quantify Behavioral Multisensory Integration Effects

Published on: May 10, 2019

Area of Science:

Biostatistics
Epidemiology
Causal Inference

Background:

Mediation analysis typically requires a single, complete dataset with exposure, mediator, and outcome variables.
Existing methods are limited by the need for complete data, hindering analysis when data is fragmented or only summary statistics are available.

Purpose of the Study:

To develop semiparametric methods for mediation analysis that can utilize incomplete datasets.
To enable the estimation of direct and indirect causal effects by combining information from multiple data sources, including summary statistics.

Main Methods:

Proposed semiparametric approach to estimate causal parameters (direct and indirect effects).
Methodology designed to integrate data from several incomplete datasets, each containing only two of the three key variables (exposure, mediator, outcome).
Capability to handle analyses using only summary statistics derived from these incomplete datasets.

Main Results:

The developed methods provide asymptotically unbiased and normally distributed estimates of causal parameters.
Simulations demonstrate the performance of the methods in finite samples and quantify efficiency loss due to incomplete data.
Application to breast cancer risk data investigates mediation by terminal duct lobular units between polygenic risk scores and cancer risk.

Conclusions:

Semiparametric methods offer a viable solution for mediation analysis when complete data is unavailable.
The proposed approach enhances the utility of fragmented datasets and summary statistics for causal inference.
The study successfully applies these methods to a relevant biomedical question in breast cancer etiology.