Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a...

Frequency-dependent Selection

Frequency-dependent Selection

When the fitness of a trait is influenced by how common it is (i.e., its frequency) relative to different traits within a population, this is referred to as frequency-dependent selection. Frequency-dependent selection may occur between species or within a single species. This type of selection can either be positive—with more common phenotypes having higher fitness—or negative, with rarer phenotypes conferring increased fitness.

One-Way ANOVA: Unequal Sample Sizes

One-Way ANOVA: Unequal Sample Sizes

One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

Outliers and Influential Points

Outliers and Influential Points

An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

From Five-Number Summary to Absolute Heterogeneity: Recent Methodological Advances in Meta-Analysis With Continuous Outcomes.

Journal of evidence-based medicine·2026

Same author

Intelligent quantification of formaldehyde in aquatic product soaking solutions via a novel deep regression framework.

Frontiers in nutrition·2026

Same author

Spatially Correlated Analysis of Infectious Disease Outcomes Based on Bayesian Functional Hierarchical Models.

Statistics in medicine·2026

Same author

A novel robust meta-analysis model using the <i>t</i> distribution for outlier accommodation and detection.

Research synthesis methods·2026

Same author

Meiotic purification of dysfunctional mitochondria in mouse oocytes.

Reproduction (Cambridge, England)·2026

Same author

Integrating multi-stage interventions for harmful algal blooms effective management.

Journal of environmental management·2026

Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026

Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026

Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026

Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026

Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026

Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Dec 24, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Hellinger distance-based stable sparse feature selection for high-dimensional class-imbalanced data.

Guang-Hui Fu¹, Yuan-Jiao Wu², Min-Jie Zong²

¹School of Science, Kunming University of Science and Technology, Kunming, 650500, People's Republic of China. guanghuifu@kust.edu.cn.

BMC Bioinformatics

|April 16, 2020

Summary

This summary is machine-generated.

We developed sssHD, a novel feature selection algorithm for high-dimensional, imbalanced data. This method effectively identifies key features and is adaptable for various machine learning tasks.

Keywords:

Class-imbalance learning Feature selection Hellinger distance Sparse regularization

More Related Videos

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Published on: May 16, 2022

Related Experiment Videos

Last Updated: Dec 24, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Published on: May 16, 2022

Area of Science:

Machine Learning
Bioinformatics
Data Science

Background:

Class-imbalance learning is crucial for high-dimensional data analysis in various scientific fields.
Traditional feature selection methods struggle with imbalanced datasets, necessitating new approaches.
Effective feature selection improves classification performance and biomarker discovery in complex data.

Purpose of the Study:

To develop a stable and sparse feature selection algorithm for high-dimensional class-imbalanced data.
To address the limitations of existing feature selection techniques in handling imbalanced datasets.
To provide an efficient method for identifying key features in complex biological and scientific data.

Main Methods:

Proposed the sssHD algorithm, combining Hellinger distance (HD) with sparse regularization.
Utilized HD's class-insensitive and translation-invariant properties for robust feature selection.
Evaluated sssHD on simulated data and five gene expression datasets, comparing it with existing methods.

Main Results:

The HD-based selection algorithm effectively identifies key features and controls false discoveries in imbalanced learning.
sssHD demonstrated highly competitive performance across five assessment metrics compared to existing procedures.
The algorithm showed minimal performance differences with or without re-balance preprocessing.

Conclusions:

sssHD is a practical, simple, and generalizable feature selection method for high-dimensional imbalanced data.
The algorithm's flexibility allows extension with different preprocessing, regularization, and classifiers.
Offers a valuable alternative for feature selection in diverse class-imbalanced learning scenarios.