Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Aggregates Classification

Aggregates Classification

Aggregate classification is generally based on its size, petrographic characteristics, weight, and source. Size classification ranges from coarse to fine aggregates, defined by the size of the particles. Coarse aggregates are particles that do not pass through ASTM sieve No. 4, and aggregates that pass through the sieve are fine aggregates.
Petrographic classification groups aggregates based on common mineralogical characteristics. Some of the common mineral groups found in aggregates are...

How Data are Classified: Categorical Data

How Data are Classified: Categorical Data

A variable, usually notated by capital letters such as X and Y, is a characteristic or measurement that can be determined for each member of a population. Data are the actual values of variables. They may be numbers, or they may be words. Datum is a single value.
Data are classified based on whether they are measurable or not. Categorical data cannot be measured; instead, it can be divided into categories. For example, if Y denotes a person's party affiliation, some examples of Y include...

Classification of Signals

Classification of Signals

In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...

Weighted Mean

Weighted Mean

While taking the arithmetic, geometric, or harmonic mean of a sample data set, equal importance is assigned to all the data points. However, all the values may not always be equally important in some data sets. An intrinsic bias might make it more important to give more weightage to specific values over others.
For example, consider the number of goals scored in the matches of a tournament. While computing the average number of goals scored in the tournament, it may be more important to...

How Data are Classified: Numerical Data

How Data are Classified: Numerical Data

Data that are countable or measurable in specific units are called numerical or quantitative data. Quantitative data are always numbers. Quantitative data are the result of counting or measuring the attributes of a population. Amount of money, pulse rate, weight, number of people living in a town, and number of students who opt for statistics are examples of quantitative data.
Quantitative data may be either discrete or continuous. All quantitative data that take on only specific numerical...

One-Way ANOVA: Unequal Sample Sizes

One-Way ANOVA: Unequal Sample Sizes

One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Temporal trends of selected diabetic foot deformities and risk factors: an exploratory analysis from a tertiary diabetes clinic.

Diabetes research and clinical practice·2026

Same author

Non-boundary covariance matrix estimation in generalized linear mixed effects models using data augmentation priors.

Biometrics·2026

Same author

Significance of chondrocyte viability in postmortem interval assessments and chondrocyte viability assay.

International journal of legal medicine·2025

Same author

Evaluation of changes in prediction modelling in biomedicine using systematic reviews.

BMC medical research methodology·2025

Same author

Recommendations for reporting regression-based norms and the development of free-access tools to implement them in practice.

PloS one·2025

Same author

The impact of bias due to exponentiation in the estimation of hazard, risk, and odds ratios: an empirical investigation from 1,495,059 effect sizes from MEDLINE/PubMed abstracts.

BMC medical research methodology·2025

Same journal

SNPio: a Python interface for population genomic data processing.

BMC bioinformatics·2026

Same journal

SpaHNR: a spatial domain identification method via sparse attention-based hierarchical node representation and multi-view contrastive learning.

BMC bioinformatics·2026

Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026

Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026

Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026

Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 13, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

SMOTE for high-dimensional class-imbalanced data.

Rok Blagus¹, Lara Lusa

¹Institute for Biostatistics and Medical Informatics, University of Ljubljana, Ljubljana, Slovenia.

BMC Bioinformatics

|March 26, 2013

Summary

This summary is machine-generated.

Synthetic Minority Oversampling TEchnique (SMOTE) is often ineffective for high-dimensional imbalanced data. For k-NN classifiers, SMOTE is only beneficial with high-dimensional data if variable selection is performed first.

Related Experiment Videos

Last Updated: May 13, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Area of Science:

Machine Learning
Data Science
Statistics

Background:

Class-imbalanced data leads to classification bias favoring the majority class, especially in high-dimensional settings.
Oversampling and undersampling techniques aim to balance data, with undersampling generally helpful and random oversampling ineffective.
Synthetic Minority Oversampling TEchnique (SMOTE) is a popular oversampling method, but its performance on high-dimensional data requires further investigation.

Purpose of the Study:

To theoretically and empirically investigate the properties of SMOTE on high-dimensional data.
To evaluate SMOTE's effectiveness in mitigating classification bias with imbalanced, high-dimensional datasets.
To compare SMOTE's performance against random undersampling for high-dimensional data.

Main Methods:

Theoretical analysis of SMOTE's behavior on high-dimensional data.
Empirical evaluation using simulated and real-world high-dimensional datasets.
Assessment of SMOTE's impact on various classifiers, including k-NN, with and without variable selection.

Main Results:

SMOTE generally fails to reduce majority class bias in high-dimensional data for most classifiers and is less effective than random undersampling.
SMOTE benefits k-NN classifiers for high-dimensional data only when variable selection is performed prior to its application.
On high-dimensional data, SMOTE preserves class means but reduces data variability and introduces inter-sample correlation.

Conclusions:

In high-dimensional settings, k-NN classifiers using Euclidean distance benefit from SMOTE only after variable selection, with more neighbors enhancing the effect.
Applying SMOTE to k-NN classifiers on high-dimensional data without prior variable selection is strongly discouraged due to significant bias towards the minority class.