Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Statistical Analysis: Overview01:11

Statistical Analysis: Overview

8.0K
When we take repeated measurements on the same or replicated samples, we will observe inconsistencies in the magnitude. These inconsistencies are called errors. To categorize and characterize these results and their errors, the researcher can use statistical analysis to determine the quality of the measurements and/or suitability of the methods.
One of the most commonly used statistical quantifiers is the mean, which is the ratio between the sum of the numerical values of all results and the...
8.0K
One-Way ANOVA: Equal Sample Sizes01:15

One-Way ANOVA: Equal Sample Sizes

3.5K
One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...
3.5K
Statistical Software for Data Analysis and Clinical Trials01:12

Statistical Software for Data Analysis and Clinical Trials

828
Statistical software is pivotal in data analysis and clinical trials by providing tools to analyze data, draw conclusions, and make predictions. These software packages range from simple data management applications to complex analytical platforms, supporting various statistical tests, models, and simulation techniques. Their significance lies in their ability to handle vast amounts of data with precision and efficiency, enabling researchers to validate hypotheses, identify trends, and make...
828
One-Way ANOVA: Unequal Sample Sizes01:15

One-Way ANOVA: Unequal Sample Sizes

5.9K
One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:
5.9K
Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data01:16

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

232
Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...
232
Biostatistics: Overview01:20

Biostatistics: Overview

387
Biostatistics plays a crucial role in understanding and analyzing data in healthcare and biology. Biostatisticians conduct experiments, gather evidence, and draw meaningful conclusions using statistical methods and techniques. Different variables form the foundation of biostatistical analysis, allowing researchers to understand and interpret data effectively. These variables are classified into different types, each serving a specific purpose in statistical analysis.
Discrete variables are...
387

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Incorporating external risk information with the Cox model under population heterogeneity: applications to trans-ancestry polygenic hazard scores.

Journal of the Royal Statistical Society. Series A, (Statistics in Society)·2026
Same author

Robust Privacy-Preserving Models for Cluster-Level Confounding: Recognizing Disparities in Access to Transplantation.

Statistics in biosciences·2026
Same author

Asymmetric integration of various cancer datasets for identifying risk-associated variants and genes.

Bioinformatics advances·2025
Same author

Molecular responses in abdominal subcutaneous adipose tissue after a session of endurance exercise: effects of exercise intensity.

The Journal of physiology·2025
Same author

Acute session of three endurance exercise intensities alters subcutaneous adipose tissue transcriptome in regular exercisers.

bioRxiv : the preprint server for biology·2025
Same author

Advance Care Planning and Decision Regret Among Stroke Surrogate Decision Makers: A Longitudinal Cohort Study.

Neurology·2025
Same journal

Fast penalized generalized estimating equations for large longitudinal functional datasets.

Biometrics·2026
Same journal

Causally-interpretable random-effects meta-analysis.

Biometrics·2026
Same journal

Statistical inference for mean function of partially observed functional time series.

Biometrics·2026
Same journal

Subgroup identification via Interaction Tree and Mixed Model for Repeated Measures with application to Alzheimer's disease.

Biometrics·2026
Same journal

Finite mixtures of linear quantile regressions with concomitant variables: a solution to endogeneity in longitudinal data modeling.

Biometrics·2026
Same journal

Discussion on "INTACT: a method for integration of longitudinal physical activity data from multiple sources" by Jingru Zhang, Erjia Cui, Hongzhe Li, and Haochang Shou.

Biometrics·2026
See all related articles

Related Experiment Video

Updated: Sep 24, 2025

An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.2K

A cross-validation statistical framework for asymmetric data integration.

Lam Tran1, Kevin He1, Di Wang1

  • 1Department of Biostatistics, University of Michigan, Ann Arbor, Michigan.

Biometrics
|May 7, 2022
PubMed
Summary
This summary is machine-generated.

Integrating external clinical data can improve model predictions. This study introduces a novel weighted integration method to minimize errors, enhancing parameter estimation and prediction accuracy for biobanks and clinical datasets.

Keywords:
asymmetric cross-validationdata integrationleave-one-out error

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.7K
Cross-Modal Multivariate Pattern Analysis
13:51

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

20.1K

Related Experiment Videos

Last Updated: Sep 24, 2025

An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.2K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.7K
Cross-Modal Multivariate Pattern Analysis
13:51

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

20.1K

Area of Science:

  • Biostatistics
  • Bioinformatics
  • Data Science

Background:

  • Biobanks and public clinical datasets offer valuable resources for research.
  • Integrating diverse datasets presents challenges due to heterogeneity and opaque protocols.
  • Naive data integration can introduce bias and lead to unreliable conclusions.

Purpose of the Study:

  • To develop a novel weighted data integration method for combining local and external clinical datasets.
  • To address limitations of existing methods, including subjective weight specification and computational intractability.
  • To improve parameter estimation and model prediction accuracy in the presence of data heterogeneity.

Main Methods:

  • Proposed a weighted integration method minimizing local data leave-one-out cross-validation (LOOCV) error.
  • Rewrote LOOCV error optimization as a function of external data integration weights for linear and Cox proportional hazards models.
  • Validated the method using simulations mimicking clinical data heterogeneity and a real-world kidney transplant patient dataset.

Main Results:

  • Demonstrated significant reductions in estimation error compared to existing methods.
  • Showcased significant improvements in prediction error.
  • Successfully applied the method to a real-world dataset from the Scientific Registry of Transplant Recipients.

Conclusions:

  • The proposed weighted integration method offers an objective and computationally tractable approach for combining diverse clinical datasets.
  • This method effectively reduces estimation and prediction errors, leading to more reliable model outcomes.
  • The approach holds promise for enhancing biobank and public clinical data utilization in medical research.