Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Feature Down-Selection to Improve Supervised Classification by Machine Learning on Mass Spectrometry Imaging Data.

Molecules (Basel, Switzerland)·2026
Same author

Quantitative determination of longitudinal CNS cholesterol loss during myelin damage and repair.

bioRxiv : the preprint server for biology·2026
Same author

Lipidomics in Children: Noninvasive Sebum Sampling in Children and Adults Allows for Assessment of Lipidomic Differences According to Age, Sex, and Biological Relatedness.

Analytical chemistry·2026
Same author

Almost Nobody Is Using ChatGPT to Write Academic Science Papers (Yet).

Big data and cognitive computing·2025
Same author

Exploring Sample Storage Conditions for the Mass Spectrometric Analysis of Extracted Lipids from Latent Fingerprints.

Biomolecules·2025
Same author

Groomed Fingerprint Sebum Sampling: Reproducibility and Variability According to Anatomical Collection Region and Biological Sex.

Molecules (Basel, Switzerland)·2025
Same journal

Proteomic Profiling of Extracellular Vesicle-Enriched Plasma Using Mag-Net for Biomarker Discovery in Pancreatic Ductal Adenocarcinoma.

Journal of proteome research·2026
Same journal

Computationally Efficient Bayesian Estimation of Graphical Networks for Omics Data.

Journal of proteome research·2026
Same journal

Hierarchy of MS-Based Evidence.

Journal of proteome research·2026
Same journal

Proteomic Profiling of Exosomes from HPV-Positive and HPV-Negative Head and Neck Squamous Cell Carcinoma: Selective Cargo Packaging.

Journal of proteome research·2026
Same journal

Proteomic Analysis Identifies ATE1-Dependent Arginylation Dysregulation across Meningioma Grades.

Journal of proteome research·2026
Same journal

Proteomic Impact of Peripheral Expression of Mutant Huntingtin in <i>C. elegans</i>.

Journal of proteome research·2026
See all related articles

Related Experiment Video

Updated: Aug 31, 2025

Constructing and Visualizing Models using Mime-based Machine-learning Framework
06:19

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

491

How (Not) to Generate a Highly Predictive Biomarker Panel Using Machine Learning.

Heather Desaire1

  • 1Department of Chemistry, University of Kansas, Lawrence, Kansas 66045, United States.

Journal of Proteome Research
|August 25, 2022
PubMed
Summary
This summary is machine-generated.

Researchers can avoid inflated machine learning results in proteomics by preventing data leakage. This cautionary review highlights flawed feature selection, leading to unreliable biomarker discovery and emphasizing correct cross-validation practices.

Keywords:
AUCbiomarkerclassificationfeature selectionmachine learningoverfittingproteomicsvalidationxgboost

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.6K
Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma
04:09

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

8.3K

Related Experiment Videos

Last Updated: Aug 31, 2025

Constructing and Visualizing Models using Mime-based Machine-learning Framework
06:19

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

491
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.6K
Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma
04:09

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

8.3K

Area of Science:

  • Proteomics
  • Bioinformatics
  • Machine Learning

Background:

  • Biomarker discovery in proteomics often employs machine learning.
  • Feature selection is a critical step in building predictive models.
  • Inappropriate feature selection can lead to overestimated model performance.

Purpose of the Study:

  • To demonstrate a common data processing error in proteomics biomarker studies.
  • To illustrate how biased feature selection inflates machine learning model accuracy.
  • To provide guidance on applying machine learning to proteomics data correctly.

Main Methods:

  • Demonstration of a flawed feature selection strategy.
  • Building a classification model using biased feature selection.
  • Simulating a dataset to highlight the impact of data leakage.

Main Results:

  • An artificially high classification accuracy of 92% and AUC of 0.98 was achieved.
  • The inflated performance was demonstrated on a dataset relying on random numbers.
  • The study identified test data leakage into the feature selection step as the core issue.

Conclusions:

  • Biomarker panels generated by selecting features across all data before cross-validation are unreliable.
  • Test data leakage during feature selection is a common pitfall in machine learning for proteomics.
  • Correct application of machine learning requires careful separation of feature selection and model validation to prevent inflated accuracies.