Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Validation of computational methods in genomics.

Edward R Doughtery1, Hua Jianping, Michael L Bittner

  • 1Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843-3128, USA. edward@ece.tamu.edu

Current Genomics
|July 23, 2008
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Comprehensive live-cell imaging analysis of cryptotanshinone and synergistic drug-screening effects in various human and canine cancer cell lines.

PloS one·2021
Same author

In Silico Modeling of the Induction of Apoptosis by Cryptotanshinone in Osteosarcoma Cell Lines.

IEEE/ACM transactions on computational biology and bioinformatics·2020
Same author

A Gaussian Mixture-Model Exploiting Pathway Knowledge for Dissecting Cancer Heterogeneity.

IEEE/ACM transactions on computational biology and bioinformatics·2018
Same author

Phenotype Classification Using Moment Features of Single-Cell Data.

Cancer informatics·2018
Same author

A Bayesian approach to determine the composition of heterogeneous cancer tissue.

BMC bioinformatics·2018
Same author

Systems biology for organotypic cell cultures.

ALTEX·2016
Same journal

RNA Modifications as Drug Targets: Unlocking the Therapeutic Potential of the Epitranscriptome.

Current genomics·2026
Same journal

AgriBioNER: A Named Entity Recognition Tool for Identification of ncRNA and Diseases in Agricultural Literature.

Current genomics·2026
Same journal

Understanding the Evolutionary Adaptations and the Associated Functional Dynamics of Diatom <i>Cyclotella Cryptica</i>: A Chloroplast Genome-wide Comparative Study.

Current genomics·2026
Same journal

The Role of Collagen Genetic Variability in Degenerative Disc Disease and Related Conditions.

Current genomics·2026
Same journal

Genomics-Driven Immunotherapy: Advancing Cancer Treatment through Personalized Approaches.

Current genomics·2026
Same journal

Innovative Applications and Challenges of Isothermal Amplification Technology in miRNA Detection.

Current genomics·2026
See all related articles

Machine learning in genomics requires robust validation, especially for classification and clustering. This study addresses the critical need for understanding algorithm performance in high-dimensional genomic data, particularly with small sample sizes, to ensure reliable clinical applications.

Area of Science:

  • Genomics
  • Machine Learning
  • Statistical Inference

Background:

  • High-throughput genomics generates vast datasets, driving machine learning adoption for tasks like classification and clustering.
  • Scientific theories require both mathematical models and experimental validation, a standard not always met by machine learning algorithms.
  • Genomic studies often face small sample sizes relative to high dimensionality, challenging the asymptotic convergence theories of many learning algorithms.

Purpose of the Study:

  • To investigate the critical issue of model validation for machine learning algorithms used in genomics.
  • To formulate the validation problem specifically for classification and clustering inference in the context of genomic data.
  • To review existing results and address the performance understanding of algorithms in translational genomics.

Related Experiment Videos

Main Methods:

  • Review of validation theories for machine learning algorithms, focusing on classification and clustering.
  • Formulation of the validation problem in the context of high-dimensional genomic data with limited sample sizes.
  • Analysis of the relationship between learning algorithms and model validation in genomics.

Main Results:

  • Machine learning algorithms do not automatically produce scientifically valid models.
  • Validation is particularly challenging in genomics due to high dimensionality and small sample sizes.
  • A lack of understanding exists regarding the validation of clustering algorithms compared to classification.

Conclusions:

  • Robust validation of machine learning models is essential for reliable translational genomics and clinical applications.
  • Addressing the validation gap in genomic data analysis is crucial for ensuring the performance of diagnostic and therapeutic procedures.
  • Further research is needed to develop and understand validation methods for genomic inference algorithms, especially clustering.