Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Genomic data sampling and its effect on classification performance assessment.

Francisco Azuaje1

  • 1School of Computing and Mathematics, University of Ulster, Jordanstown, Nothern Ireland, UK. fj.azuaje@ulster.ac.uk

BMC Bioinformatics
|January 30, 2003
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Where Medical Statistics Meets Artificial Intelligence.

The New England journal of medicine·2023
Same author

Correction: Machine learning for predicting neurodegenerative diseases in the general older population: a cohort study.

BMC medical research methodology·2023
Same author

Machine learning for predicting neurodegenerative diseases in the general older population: a cohort study.

BMC medical research methodology·2023
Same author

Allergic airway inflammation delays glioblastoma progression and reinvigorates systemic and local immunity in mice.

Allergy·2022
Same author

DrDimont: explainable drug response prediction from differential analysis of multi-omics networks.

Bioinformatics (Oxford, England)·2022
Same author

Oncolytic H-1 parvovirus binds to sialic acid on laminins for cell attachment and entry.

Nature communications·2021
Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026
Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026
Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026
Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026
Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026
Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026
See all related articles

Data sampling techniques significantly impact neural network classifier accuracy assessments, especially with small biological datasets. Choosing the right method is crucial for reliable predictions in gene discovery and disease classification.

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Machine Learning

Background:

  • Supervised classification is essential in bioinformatics for gene discovery using machine learning models like neural networks.
  • Assessing classifier predictive quality through accuracy estimation faces limitations with small sample sizes.

Purpose of the Study:

  • This study investigates the impact of various data sampling techniques on the assessment of neural network classifiers.
  • The research aims to understand how different sampling methods affect accuracy estimations in small-data scenarios.

Main Methods:

  • The study examined three data sampling techniques: cross-validation, leave-one-out, and bootstrap.
  • These techniques were applied to small-sample datasets for classification problems, including microarray data (leukemia, small round blue-cell tumors) and splice-junction prediction.

Related Experiment Videos

Main Results:

  • Data sampling techniques produced varying accuracy estimations, with variations amplified in small datasets.
  • The quality of accuracy estimates is influenced by the number of train-test experiments and the amount of training data.

Conclusions:

  • Accurate predictive quality assessment of biomolecular data classifiers hinges on data size, sampling methods, and the number of train-test experiments.
  • Different sampling techniques yield conservative or optimistic accuracy estimates, necessitating careful selection based on prediction problem complexity.