Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Kaplan-Meier Approach01:24

Kaplan-Meier Approach

287
The Kaplan-Meier estimator is a non-parametric method used to estimate the survival function from time-to-event data. In medical research, it is frequently employed to measure the proportion of patients surviving for a certain period after treatment. This estimator is fundamental in analyzing time-to-event data, making it indispensable in clinical trials, epidemiological studies, and reliability engineering. By estimating survival probabilities, researchers can evaluate treatment effectiveness,...
287
Comparing the Survival Analysis of Two or More Groups01:20

Comparing the Survival Analysis of Two or More Groups

311
Survival analysis is a cornerstone of medical research, used to evaluate the time until an event of interest occurs, such as death, disease recurrence, or recovery. Unlike standard statistical methods, survival analysis is particularly adept at handling censored data—instances where the event has not occurred for some participants by the end of the study or remains unobserved. To address these unique challenges, specialized techniques like the Kaplan-Meier estimator, log-rank test, and...
311
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

2.3K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
2.3K
Statistical Methods for Analyzing Epidemiological Data01:25

Statistical Methods for Analyzing Epidemiological Data

576
Epidemiological data primarily involves information on specific populations' occurrence, distribution, and determinants of health and diseases. This data is crucial for understanding disease patterns and impacts, aiding public health decision-making and disease prevention strategies. The analysis of epidemiological data employs various statistical methods to interpret health-related data effectively. Here are some commonly used methods:
576
Mechanistic Models: Compartment Models in Individual and Population Analysis01:23

Mechanistic Models: Compartment Models in Individual and Population Analysis

89
Mechanistic models are utilized in individual analysis using single-source data, but imperfections arise due to data collection errors, preventing perfect prediction of observed data. The mathematical equation involves known values (Xi), observed concentrations (Ci), measurement errors (εi), model parameters (ϕj), and the related function (ƒi) for i number of values. Different least-squares metrics quantify differences between predicted and observed values. The ordinary least...
89
Analysis Methods of Pharmacokinetic Data: Model and Model-Independent Approaches01:14

Analysis Methods of Pharmacokinetic Data: Model and Model-Independent Approaches

242
Drug disposition in the body is a complex process and can be studied using two major approaches: the model and the model-independent approaches.
The model approach uses mathematical models to describe changes in drug concentration over time. Pharmacokinetic models help characterize drug behavior in patients, predict drug concentration in the body fluids, calculate optimum dosage regimens, and evaluate the risk of toxicity. However, ensuring that the model fits the experimental data accurately...
242

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Replicability of multivariate brain-behaviour associations depends on clinical profile.

Communications biology·2026
Same author

Open neuroinformatics infrastructure ecosystem for federated multisite studies.

bioRxiv : the preprint server for biology·2026
Same author

Clinical profile impacts the replicability of multivariate brain-behavioural associations.

bioRxiv : the preprint server for biology·2025
Same author

Mining the neuroimaging literature.

eLife·2025
Same author

Challenging the status quo: A guide to open and reproducible neuroimaging for early career researchers.

Imaging neuroscience (Cambridge, Mass.)·2025
Same author

Open-source platforms to investigate analytical flexibility in neuroimaging.

Imaging neuroscience (Cambridge, Mass.)·2025
Same journal

NanoporeDB: A Structural Resource Of Multimeric Protein Nanopores For Single-Molecule Sensing.

GigaScience·2026
Same journal

From the Brain Cell Atlas to Precision Neurology: A review of the application of AI-driven multi-omics in brain science.

GigaScience·2026
Same journal

Comparison of Deep Learning Approaches for Extreme Low-SNR Image Restoration.

GigaScience·2026
Same journal

ScopeViewer: A Browser-Based Solution for Visualizing Large Biological Images.

GigaScience·2026
Same journal

ChatMDV: Reducing Technical Barriers in Bioinformatics Analysis using Large Language Models.

GigaScience·2026
Same journal

ClusterGraph: a new tool for visualisation and compression of multidimensional data.

GigaScience·2026
See all related articles

Related Experiment Video

Updated: Sep 27, 2025

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

14.7K

Benchmarking missing-values approaches for predictive models on health databases.

Alexandre Perez-Lebel1,2,3, Gaël Varoquaux1,2,3, Marine Le Morvan2

  • 1McConnell Brain Imaging Centre, The Neuro (Montreal Neurological Institute-Hospital), Faculty of Medicine, McGill University, 3801 University Street, Montreal, QC H3A 2B4, Canada.

Gigascience
|April 15, 2022
PubMed
Summary
This summary is machine-generated.

Machine learning models can effectively handle missing values in large health datasets. Native support for missing values in models offers robust, fast, and accurate predictions, outperforming imputation methods.

Keywords:
baggingbenchmarkimputationmachine learningmissing valuesmultiple imputationsupervised learning

More Related Videos

An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.2K
Performing Data Mining And Integrative Analysis Of Biomarker in Breast Cancer Using Multiple Publicly Accessible Databases
07:41

Performing Data Mining And Integrative Analysis Of Biomarker in Breast Cancer Using Multiple Publicly Accessible Databases

Published on: May 17, 2019

9.1K

Related Experiment Videos

Last Updated: Sep 27, 2025

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

14.7K
An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.2K
Performing Data Mining And Integrative Analysis Of Biomarker in Breast Cancer Using Multiple Publicly Accessible Databases
07:41

Performing Data Mining And Integrative Analysis Of Biomarker in Breast Cancer Using Multiple Publicly Accessible Databases

Published on: May 17, 2019

9.1K

Area of Science:

  • Machine Learning
  • Data Science
  • Health Informatics

Background:

  • Large databases, common in health informatics, often contain missing values, complicating data management and analysis.
  • Existing research on handling missing values primarily focuses on inferential statistics, not predictive modeling.
  • Machine learning models, particularly discriminative approaches, offer new strategies for addressing missing data in large datasets.

Purpose of the Study:

  • To systematically benchmark missing-value strategies for predictive modeling using large health databases.
  • To compare the performance of native handling of missing values versus imputation methods in machine learning.
  • To evaluate prediction accuracy and computational efficiency of different missing-value strategies.

Main Methods:

  • Conducted a benchmark study on six large health datasets (electronic health records, brain imaging, surveys).
  • Utilized gradient-boosted trees to compare native missing-value handling against simple and advanced imputation techniques.
  • Assessed prediction accuracy and computational time for each strategy.

Main Results:

  • Native handling of missing values within gradient-boosted trees demonstrated robust, fast, and accurate predictive performance.
  • Imputation methods, while potentially improving prediction, incurred significantly longer computational times on large datasets.
  • The inclusion of indicator columns for imputed values was crucial, suggesting data were not missing at random.

Conclusions:

  • Supervised machine learning models with native support for missing values provide superior prediction accuracy with lower computational cost compared to imputation.
  • When imputation is employed, adding indicator columns to denote imputed data is essential for optimal performance.
  • Learning algorithms that incorporate missing values directly (missing incorporated attribute) are efficient and effective for large-scale health data.