Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Survival Tree01:19

Survival Tree

504
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
504
Goodness-of-Fit Test01:16

Goodness-of-Fit Test

9.5K
The goodness-of-fit test is a type of hypothesis test which determines whether the data "fits" a particular distribution. For example, one may suspect that some anonymous data may fit a binomial distribution. A chi-square test (meaning the distribution for the hypothesis test is chi-square) can be used to determine if there is a fit. The null and alternative hypotheses may be written in sentences or stated as equations or inequalities. The test statistic for a goodness-of-fit test is given as...
9.5K
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

8.9K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).
8.9K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

5.0K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
5.0K
Regression Toward the Mean01:52

Regression Toward the Mean

7.3K
Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...
7.3K
Receiver Operating Characteristic Plot01:15

Receiver Operating Characteristic Plot

586
A ROC (Receiver Operating Characteristic) plot is a graphical tool used to assess the performance of a binary classification model by illustrating the trade-off between sensitivity (true positive rate) and specificity (false positive rate). By plotting sensitivity against 1 - specificity across various threshold settings, the ROC curve shows how well the model distinguishes between classes, with a curve closer to the top-left corner indicating a more accurate model. The area under the ROC curve...
586

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Bio-effects of engineering nanomaterials NiFe-based LDHs on ryegrass-soil system.

Advanced biotechnology·2026
Same author

Hemodynamic-guided V-A ECMO management for cardiogenic shock: Insight from initial experience from TWEET study (to wean an ECMO trials).

Perfusion·2026
Same author

RescueGPT: An Automated System for Detecting Adverse Safety Events in Prehospital Emergency Medical Service Notes With a Zero-Shot Approach With Large Language Models: A Proof-of-Concept Study.

Learning health systems·2026
Same author

Wrinkled Photonic Elastomers with Dynamic Structural Color Patterns for Multilevel Optical Anti-Counterfeiting.

Gels (Basel, Switzerland)·2026
Same author

Long-Term Straw Return Reverses Antibiotic Resistance Accumulation in Maize Rhizosphere through Integrated Soil-Microbial Mechanisms.

Environmental science & technology·2026
Same author

Learning From Crowds With Multiple Feature Dynamic Fusion-Based Annotation Generation.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Widening Health Inequality and Causal Metabolic Drivers in Global Colorectal Cancer: A Multi-Dimensional Study.

Cancer informatics·2026
Same journal

GFAP-Dependent Transcriptional Dynamics and Cellular Heterogeneity in Primary, Recurrent, and Grade III Gliomas.

Cancer informatics·2026
Same journal

Translating Data Into Clinical Tools: An Integrative Strategy for Precision Biomarker Identification in Soft Tissue Sarcoma Diagnosis and Prognosis.

Cancer informatics·2026
Same journal

The MAPK Pathway Coordinates an Immunosuppressive Microenvironment in Colorectal Cancer: A Single-Cell Guided Prognostic Model.

Cancer informatics·2026
Same journal

Multi-Scale Cross-Attention Multiple Instance Learning Network for Automated Classification of Colorectal Polyps.

Cancer informatics·2026
Same journal

LEPR Contributes to Lung Squamous Cell Carcinoma: Insights From Mendelian Randomization and Experimental Studies.

Cancer informatics·2026
See all related articles

Related Experiment Video

Updated: Apr 18, 2026

Asthma Detection Research Based on Voice Signal Processing and Machine Learning
04:04

Asthma Detection Research Based on Voice Signal Processing and Machine Learning

Published on: July 22, 2025

1.3K

Overcome support vector machine diagnosis overfitting.

Henry Han1, Xiaoqian Jiang2

  • 1Department of Computer and Information Science, Fordham University, New York, NY, USA. ; Quantitative Proteomics Center, Columbia University, New York, NY, USA.

Cancer Informatics
|January 10, 2015
PubMed
Summary
This summary is machine-generated.

Support vector machines (SVMs) can overfit high-dimensional omics data in disease diagnosis. A novel sparse-coding kernel approach overcomes this overfitting, improving diagnostic accuracy and enabling biomarker discovery.

Keywords:
SVMbiomarker discoveryoverfitting

More Related Videos

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model
07:15

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

7.7K

Related Experiment Videos

Last Updated: Apr 18, 2026

Asthma Detection Research Based on Voice Signal Processing and Machine Learning
04:04

Asthma Detection Research Based on Voice Signal Processing and Machine Learning

Published on: July 22, 2025

1.3K
Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model
07:15

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

7.7K

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Machine Learning in Medicine

Background:

  • Support vector machines (SVMs) are prevalent for molecular disease diagnosis.
  • Overfitting in high-dimensional omics data poses a risk to diagnostic accuracy and clinical decisions.
  • Prior research has not comprehensively analyzed SVM overfitting in this context.

Purpose of the Study:

  • To investigate the characteristics of SVM overfitting in high-dimensional omics data for disease diagnosis.
  • To develop a novel method to mitigate SVM overfitting and enhance diagnostic performance.
  • To introduce a new algorithm for biomarker discovery leveraging overfitting phenomena.

Main Methods:

  • Theoretical and practical analysis of SVM overfitting with Gaussian kernels in omics data.
  • Development and application of a sparse-coding kernel approach.
  • Introduction of the Gene-Switch-Marker (GSM) algorithm for biomarker identification.

Main Results:

  • SVM classifiers with Gaussian kernels are prone to overfitting in high-dimensional omics data due to inherent data variations.
  • The proposed sparse-coding kernel approach effectively addresses SVM overfitting.
  • The novel approach achieves robust performance and good diagnostic accuracy, outperforming traditional methods.
  • The GSM algorithm successfully captures meaningful biomarkers by exploiting SVM overfitting on single genes.

Conclusions:

  • SVM overfitting is a significant challenge in omics-based disease diagnosis, particularly with Gaussian kernels.
  • The sparse-coding kernel approach offers a rigorous and effective solution to SVM overfitting.
  • This work presents the first dedicated method to overcome SVM overfitting in this domain.
  • The developed methods advance both diagnostic accuracy and biomarker discovery in molecular medicine.