Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Video

Updated: Mar 12, 2026

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Task-Based Sampling of Patient Data for Rigorous Machine Learning/AI Performance Assessment.

Natalie Baughan^1,2, Heather M Whitney³, Karen Drukker³

¹Department of Radiation Oncology, Henry Ford Health, Detroit, MI, 48202, USA. nbaugha1@hfhs.org.

Journal of Imaging Informatics in Medicine

|March 10, 2026

Summary

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Physics-informed data augmentation to simulate low dose CT scans: Application to lung nodule detection.

Medical physics·2026

Same author

Generalizations of the Jaccard index and Sørensen index for assessing agreement across multiple readers in object detection and instance segmentation in biomedical imaging.

Journal of medical imaging (Bellingham, Wash.)·2026

Same author

Ethical Responsibility in the Off-Label Use of AI in Medical Imaging.

The Journal of clinical ethics·2026

Same author

Synthetic data in radiological imaging: current state and future outlook.

BJR artificial intelligence·2026

Same author

Using a Physics-Based Approach to Standardize Radiomics Values: Experimental Validation in an Anthropomorphic Phantom on a Clinical CT Scanner Using a Range of Dose Levels and Reconstruction Kernels.

Proceedings of SPIE--the International Society for Optical Engineering·2026

Same author

A Generative Model of Lung CT Conditioned on Radiomics Features.

Proceedings of SPIE--the International Society for Optical Engineering·2026

Same journal

Kolmogorov-Arnold Guided Local-Global Attention for Medical Image Classification.

Journal of imaging informatics in medicine·2026

Same journal

Artificial Intelligence-Assisted Inner Ear Computed Tomography Analysis: Radiomics-Based Comparison of Affected and Unaffected Ears in Idiopathic Sudden Sensorineural Hearing Loss.

Journal of imaging informatics in medicine·2026

Same journal

High Adoption, Higher Expectations: A Cross-Sectional Survey of Radiologist Engagement with Artificial Intelligence in the United Arab Emirates.

Journal of imaging informatics in medicine·2026

Same journal

Complex-valued Multi-scale Hybrid Attention Network for Fast MRI via Sparsified Data Learning.

Journal of imaging informatics in medicine·2026

Same journal

Automatic Phase and Sequence Identification in Gd-EOB-DTPA-Enhanced Liver MRI Using Deep Convolutional and Sequential Learning.

Journal of imaging informatics in medicine·2026

Same journal

Ultrasound-Based AI in Predicting Hormone Receptor Status in Breast Cancer: Is "Digital Biopsy" Possible.

Journal of imaging informatics in medicine·2026

See all related articles

This summary is machine-generated.

A new task-based sampling algorithm helps create representative AI training datasets. This method reduces sampling bias by matching data to intended patient populations for improved AI performance assessment.

Area of Science:

Medical informatics
Artificial intelligence in healthcare
Data science

Background:

AI algorithm performance assessment requires independent datasets representative of the intended clinical population.
Using all available data can be impractical and may introduce sampling bias.
Representative data is crucial for reliable AI model training and validation.

Purpose of the Study:

To develop and demonstrate a computational method for task-based data sampling from large repositories.
To generate datasets matched to specific demographic and clinical profiles for AI performance assessment.
To mitigate sampling bias in AI algorithm development and evaluation.

Main Methods:

A task-based sampling algorithm was developed, requiring users to define an initial cohort, target distribution, and allowable deviation.

Keywords:

Algorithm performance Bias mitigation Data sampling Image database Machine learning

More Related Videos

Author Spotlight: AI-Driven Trypanosome Species Detection from Microscopic Images

Author Spotlight: AI-Driven Trypanosome Species Detection from Microscopic Images

Published on: October 27, 2023

Related Experiment Videos

Last Updated: Mar 12, 2026

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Author Spotlight: AI-Driven Trypanosome Species Detection from Microscopic Images

Author Spotlight: AI-Driven Trypanosome Species Detection from Microscopic Images

Published on: October 27, 2023

The algorithm was applied to the Medical Imaging and Data Resource Center (MIDRC) data commons.

Demographic characteristics and disease states were used as clinical attributes for matching to an intended population profile (e.g., CDC demographics).

Main Results:

The algorithm successfully sampled cohorts (542 and 870 patients) from an initial >4000 patient cohort.
Sampled cohorts closely matched the target demographic distribution with low average clinical attribute differences (1.0% and 2.1%).
The method demonstrated effectiveness in generating matched samples for AI performance assessment.

Conclusions:

The developed task-based sampling algorithm effectively generates matched samples from large datasets.
This approach reduces sampling bias, enhancing the reliability of AI algorithm training and performance assessment.
The method provides a valuable tool for creating representative datasets in medical AI research.