Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Survival Tree01:19

Survival Tree

333
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
333
Frequency-dependent Selection01:21

Frequency-dependent Selection

22.9K
When the fitness of a trait is influenced by how common it is (i.e., its frequency) relative to different traits within a population, this is referred to as frequency-dependent selection. Frequency-dependent selection may occur between species or within a single species. This type of selection can either be positive—with more common phenotypes having higher fitness—or negative, with rarer phenotypes conferring increased fitness.
22.9K
One-Way ANOVA: Unequal Sample Sizes01:15

One-Way ANOVA: Unequal Sample Sizes

6.5K
One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:
6.5K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

3.4K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
3.4K
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

6.6K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
6.6K
Outliers and Influential Points01:08

Outliers and Influential Points

5.7K
An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...
5.7K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

From Five-Number Summary to Absolute Heterogeneity: Recent Methodological Advances in Meta-Analysis With Continuous Outcomes.

Journal of evidence-based medicine·2026
Same author

Intelligent quantification of formaldehyde in aquatic product soaking solutions via a novel deep regression framework.

Frontiers in nutrition·2026
Same author

Spatially Correlated Analysis of Infectious Disease Outcomes Based on Bayesian Functional Hierarchical Models.

Statistics in medicine·2026
Same author

A novel robust meta-analysis model using the <i>t</i> distribution for outlier accommodation and detection.

Research synthesis methods·2026
Same author

Meiotic purification of dysfunctional mitochondria in mouse oocytes.

Reproduction (Cambridge, England)·2026
Same author

Integrating multi-stage interventions for harmful algal blooms effective management.

Journal of environmental management·2026
Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026
Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026
Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026
Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026
Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026
Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026
See all related articles

Related Experiment Video

Updated: Dec 24, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.9K

Hellinger distance-based stable sparse feature selection for high-dimensional class-imbalanced data.

Guang-Hui Fu1, Yuan-Jiao Wu2, Min-Jie Zong2

  • 1School of Science, Kunming University of Science and Technology, Kunming, 650500, People's Republic of China. guanghuifu@kust.edu.cn.

BMC Bioinformatics
|April 16, 2020
PubMed
Summary
This summary is machine-generated.

We developed sssHD, a novel feature selection algorithm for high-dimensional, imbalanced data. This method effectively identifies key features and is adaptable for various machine learning tasks.

Keywords:
Class-imbalance learningFeature selectionHellinger distanceSparse regularization

More Related Videos

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.8K
Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data
04:57

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Published on: May 16, 2022

17.0K

Related Experiment Videos

Last Updated: Dec 24, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.9K
A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.8K
Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data
04:57

Assisted Selection of Biomarkers by Linear Discriminant Analysis Effect Size LEfSe in Microbiome Data

Published on: May 16, 2022

17.0K

Area of Science:

  • Machine Learning
  • Bioinformatics
  • Data Science

Background:

  • Class-imbalance learning is crucial for high-dimensional data analysis in various scientific fields.
  • Traditional feature selection methods struggle with imbalanced datasets, necessitating new approaches.
  • Effective feature selection improves classification performance and biomarker discovery in complex data.

Purpose of the Study:

  • To develop a stable and sparse feature selection algorithm for high-dimensional class-imbalanced data.
  • To address the limitations of existing feature selection techniques in handling imbalanced datasets.
  • To provide an efficient method for identifying key features in complex biological and scientific data.

Main Methods:

  • Proposed the sssHD algorithm, combining Hellinger distance (HD) with sparse regularization.
  • Utilized HD's class-insensitive and translation-invariant properties for robust feature selection.
  • Evaluated sssHD on simulated data and five gene expression datasets, comparing it with existing methods.

Main Results:

  • The HD-based selection algorithm effectively identifies key features and controls false discoveries in imbalanced learning.
  • sssHD demonstrated highly competitive performance across five assessment metrics compared to existing procedures.
  • The algorithm showed minimal performance differences with or without re-balance preprocessing.

Conclusions:

  • sssHD is a practical, simple, and generalizable feature selection method for high-dimensional imbalanced data.
  • The algorithm's flexibility allows extension with different preprocessing, regularization, and classifiers.
  • Offers a valuable alternative for feature selection in diverse class-imbalanced learning scenarios.