Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Statistical Analysis: Overview

Statistical Analysis: Overview

When we take repeated measurements on the same or replicated samples, we will observe inconsistencies in the magnitude. These inconsistencies are called errors. To categorize and characterize these results and their errors, the researcher can use statistical analysis to determine the quality of the measurements and/or suitability of the methods.
One of the most commonly used statistical quantifiers is the mean, which is the ratio between the sum of the numerical values of all results and the...

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...

Statistical Software for Data Analysis and Clinical Trials

Statistical Software for Data Analysis and Clinical Trials

Statistical software is pivotal in data analysis and clinical trials by providing tools to analyze data, draw conclusions, and make predictions. These software packages range from simple data management applications to complex analytical platforms, supporting various statistical tests, models, and simulation techniques. Their significance lies in their ability to handle vast amounts of data with precision and efficiency, enabling researchers to validate hypotheses, identify trends, and make...

One-Way ANOVA: Unequal Sample Sizes

One-Way ANOVA: Unequal Sample Sizes

One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...

Biostatistics: Overview

Biostatistics: Overview

Biostatistics plays a crucial role in understanding and analyzing data in healthcare and biology. Biostatisticians conduct experiments, gather evidence, and draw meaningful conclusions using statistical methods and techniques. Different variables form the foundation of biostatistical analysis, allowing researchers to understand and interpret data effectively. These variables are classified into different types, each serving a specific purpose in statistical analysis.
Discrete variables are...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Incorporating external risk information with the Cox model under population heterogeneity: applications to trans-ancestry polygenic hazard scores.

Journal of the Royal Statistical Society. Series A, (Statistics in Society)·2026

Same author

Robust Privacy-Preserving Models for Cluster-Level Confounding: Recognizing Disparities in Access to Transplantation.

Statistics in biosciences·2026

Same author

Asymmetric integration of various cancer datasets for identifying risk-associated variants and genes.

Bioinformatics advances·2025

Same author

Molecular responses in abdominal subcutaneous adipose tissue after a session of endurance exercise: effects of exercise intensity.

The Journal of physiology·2025

Same author

Acute session of three endurance exercise intensities alters subcutaneous adipose tissue transcriptome in regular exercisers.

bioRxiv : the preprint server for biology·2025

Same author

Advance Care Planning and Decision Regret Among Stroke Surrogate Decision Makers: A Longitudinal Cohort Study.

Neurology·2025

Same journal

Fast penalized generalized estimating equations for large longitudinal functional datasets.

Biometrics·2026

Same journal

Causally-interpretable random-effects meta-analysis.

Biometrics·2026

Same journal

Statistical inference for mean function of partially observed functional time series.

Biometrics·2026

Same journal

Subgroup identification via Interaction Tree and Mixed Model for Repeated Measures with application to Alzheimer's disease.

Biometrics·2026

Same journal

Finite mixtures of linear quantile regressions with concomitant variables: a solution to endogeneity in longitudinal data modeling.

Biometrics·2026

Same journal

Discussion on "INTACT: a method for integration of longitudinal physical activity data from multiple sources" by Jingru Zhang, Erjia Cui, Hongzhe Li, and Haochang Shou.

Biometrics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 24, 2025

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

A cross-validation statistical framework for asymmetric data integration.

Lam Tran¹, Kevin He¹, Di Wang¹

¹Department of Biostatistics, University of Michigan, Ann Arbor, Michigan.

|May 7, 2022

Summary

This summary is machine-generated.

Integrating external clinical data can improve model predictions. This study introduces a novel weighted integration method to minimize errors, enhancing parameter estimation and prediction accuracy for biobanks and clinical datasets.

Keywords:

asymmetric cross-validation data integration leave-one-out error

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Cross-Modal Multivariate Pattern Analysis

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

Related Experiment Videos

Last Updated: Sep 24, 2025

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Cross-Modal Multivariate Pattern Analysis

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

Area of Science:

Biostatistics
Bioinformatics
Data Science

Background:

Biobanks and public clinical datasets offer valuable resources for research.
Integrating diverse datasets presents challenges due to heterogeneity and opaque protocols.
Naive data integration can introduce bias and lead to unreliable conclusions.

Purpose of the Study:

To develop a novel weighted data integration method for combining local and external clinical datasets.
To address limitations of existing methods, including subjective weight specification and computational intractability.
To improve parameter estimation and model prediction accuracy in the presence of data heterogeneity.

Main Methods:

Proposed a weighted integration method minimizing local data leave-one-out cross-validation (LOOCV) error.
Rewrote LOOCV error optimization as a function of external data integration weights for linear and Cox proportional hazards models.
Validated the method using simulations mimicking clinical data heterogeneity and a real-world kidney transplant patient dataset.

Main Results:

Demonstrated significant reductions in estimation error compared to existing methods.
Showcased significant improvements in prediction error.
Successfully applied the method to a real-world dataset from the Scientific Registry of Transplant Recipients.

Conclusions:

The proposed weighted integration method offers an objective and computationally tractable approach for combining diverse clinical datasets.
This method effectively reduces estimation and prediction errors, leading to more reliable model outcomes.
The approach holds promise for enhancing biobank and public clinical data utilization in medical research.