Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Partially supervised learning using an EM-boosting algorithm.

Yutaka Yasui1, Margaret Pepe, Li Hsu

  • 1Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109-1024, USA. yyasui@fhcrc.org

Biometrics
|March 23, 2004
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Evaluating statistical models for overdispersed multi-omics data: a multiplex immunofluorescence case study.

American journal of epidemiology·2026
Same author

Site- and age-dependent associations between Fusobacterium nucleatum and colorectal cancer mortality.

Cancer·2026
Same author

Alcohol consumption and molecular subtypes of colorectal cancer: pooled observational and Mendelian randomization analyses.

The American journal of clinical nutrition·2026
Same author

MyGeneRisk Colon: A Web-Based Tool for Personalized Colorectal Cancer Risk Prediction Based on Genetics and Lifestyle.

medRxiv : the preprint server for health sciences·2026
Same author

Design of MOSAAIC (Multi-Ethnic Observational Study in American Asian and Pacific Islander Communities).

JACC. Asia·2026
Same author

TGF-β Pathway-Based Polygenic Risk Score Modifies the Association between Red Meat Intake and Colorectal Cancer Risk: Application of a Novel Pathway-Based PRS Method.

Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology·2026
Same journal

Fast penalized generalized estimating equations for large longitudinal functional datasets.

Biometrics·2026
Same journal

Causally-interpretable random-effects meta-analysis.

Biometrics·2026
Same journal

Statistical inference for mean function of partially observed functional time series.

Biometrics·2026
Same journal

Subgroup identification via Interaction Tree and Mixed Model for Repeated Measures with application to Alzheimer's disease.

Biometrics·2026
Same journal

Finite mixtures of linear quantile regressions with concomitant variables: a solution to endogeneity in longitudinal data modeling.

Biometrics·2026
Same journal

Discussion on "INTACT: a method for integration of longitudinal physical activity data from multiple sources" by Jingru Zhang, Erjia Cui, Hongzhe Li, and Haochang Shou.

Biometrics·2026
See all related articles

This study introduces EM-Boost, a novel partially supervised learning algorithm for accurate cancer classification using mass-spectrometry data, even with imperfect diagnostic labels. EM-Boost significantly improves misclassification rates compared to standard methods.

Area of Science:

  • Biomedical data analysis
  • Machine learning in healthcare
  • Bioinformatics

Background:

  • Supervised learning aims to build accurate classifiers from labeled training data.
  • Biomedical applications often face challenges with mislabeled class labels due to imperfect diagnostic certainty, such as in cancer detection using protein mass-spectrometry.
  • Traditional supervised learning methods struggle when training data contains errors in class labels.

Purpose of the Study:

  • To develop a robust supervised learning algorithm for situations with imperfectly labeled data, termed partially supervised learning.
  • To adapt the boosting algorithm for high-dimensional data, specifically for protein mass-spectrometry, to handle mislabeled samples.
  • To improve the accuracy of cancer versus non-cancer classification from serum samples.

Related Experiment Videos

Main Methods:

  • Proposed a modification of the boosting algorithm, named EM-Boost, to address partially supervised learning scenarios.
  • Treated true class membership of mislabeled samples as missing data.
  • Employed an algorithm related to the Expectation-Maximization (EM) algorithm for loss function minimization.

Main Results:

  • EM-Boost demonstrated notable improvements in misclassification rates when compared to the original boosting algorithm.
  • The method was validated using artificially mislabeled protein mass-spectrometry data.
  • The proposed approach effectively handles the uncertainty inherent in diagnostic labels.

Conclusions:

  • The EM-Boost algorithm offers a significant advancement for supervised learning with partially labeled data in biomedical applications.
  • This method enhances the reliability of classifiers built from potentially erroneous diagnostic information, such as in cancer detection.
  • Partially supervised learning, as implemented by EM-Boost, is a promising direction for improving diagnostic accuracy in data-driven scientific fields.