Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Censoring Survival Data01:09

Censoring Survival Data

343
Survival analysis is a statistical method used to analyze time-to-event data, often employed in fields such as medicine, engineering, and social sciences. One of the key challenges in survival analysis is dealing with incomplete data, a phenomenon known as "censoring." Censoring occurs when the event of interest (such as death, relapse, or system failure) has not occurred for some individuals by the end of the study period or is otherwise unobservable, and it might have many different...
343
Randomized Experiments01:13

Randomized Experiments

8.5K
The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...
8.5K
Comparing the Survival Analysis of Two or More Groups01:20

Comparing the Survival Analysis of Two or More Groups

388
Survival analysis is a cornerstone of medical research, used to evaluate the time until an event of interest occurs, such as death, disease recurrence, or recovery. Unlike standard statistical methods, survival analysis is particularly adept at handling censored data—instances where the event has not occurred for some participants by the end of the study or remains unobserved. To address these unique challenges, specialized techniques like the Kaplan-Meier estimator, log-rank test, and...
388
Sampling Plans01:23

Sampling Plans

606
Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...
606
Survival Tree01:19

Survival Tree

216
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
216
Convenience Sampling Method00:55

Convenience Sampling Method

10.5K
Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population. The sampling method ensures that samples are drawn without bias and accurately represent the population.
Convenience sampling is a non-random method of sample selection; this method selects individuals that are easily accessible and may result in biased data. For example, a marketing...
10.5K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Global multi-ancestry genome-wide analyses identify genes and biological pathways associated with thyroid cancer and benign thyroid diseases.

Nature genetics·2026
Same author

cellSTAAR: incorporating single-cell-sequencing-based functional data to boost power in rare variant association testing of noncoding regions.

Nature methods·2025
Same author

A spectral component approach leveraging identity-by-descent graphs to address recent population structure in genomic analysis.

Genome research·2025
Same author

Generating diffusion MRI scalar maps from T1-weighted images using Reversible GANs.

bioRxiv : the preprint server for biology·2025
Same author

Real-World Validation of the Purity Independent Subtyping of Tumors Classifier for Informing Therapy Selection in Pancreatic Ductal Adenocarcinoma.

JCO precision oncology·2025
Same author

SPC: a SPectral Component approach leveraging Identity-by-Descent graphs to address recent population structure in genomic analysis.

medRxiv : the preprint server for health sciences·2025
Same journal

OmicsTransformer: Self-Supervised Masked Consistency and Uncertainty-Aware Fusion for Robust Multi-Omics Prediction.

Bioinformatics (Oxford, England)·2026
Same journal

Computational Tool Choice Impacts CRISPR Spacer-Proto spacer Detection.

Bioinformatics (Oxford, England)·2026
Same journal

ARISE: RNA-Anchored Shared-Edge Topology and Hierarchical Fusion for Spatial Multi-Omics Integration.

Bioinformatics (Oxford, England)·2026
Same journal

Interactive exploration of biobank-scale ancestral recombination graphs with Lorax.

Bioinformatics (Oxford, England)·2026
Same journal

PepMCP: A Graph-Based Membrane Contact Probability Predictor for Membrane-Lytic Antimicrobial Peptides.

Bioinformatics (Oxford, England)·2026
Same journal

ARGscape: A modular, interactive tool for manipulation of spatiotemporal ancestral recombination graphs.

Bioinformatics (Oxford, England)·2026
See all related articles

Related Experiment Video

Updated: Nov 11, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.8K

EPS: automated feature selection in case-control studies using extreme pseudo-sampling.

Ruhollah Shemirani1, Stephane Wenric2, Eimear Kenny2

  • 1Information Sciences Institute, University of Southern California, Marina del Rey, CA 90292, USA.

Bioinformatics (Oxford, England)
|March 28, 2021
PubMed
Summary
This summary is machine-generated.

The Extreme Pseudo-Sampling (EPS) algorithm enhances feature selection in high-dimensional biological data using deep learning and regression. This open-source tool improves predictive accuracy by generating and analyzing pseudo-samples for better model training.

More Related Videos

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.7K
Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

1.1K

Related Experiment Videos

Last Updated: Nov 11, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.8K
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.7K
Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

1.1K

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Machine Learning

Background:

  • High-dimensional biological case-control datasets present challenges for identifying informative predictive features.
  • Effective feature selection is crucial for accurate biological data analysis and interpretation.

Purpose of the Study:

  • To introduce and evaluate the Extreme Pseudo-Sampling (EPS) algorithm for feature selection in high-dimensional biological data.
  • To present an enhanced, open-source Python implementation of the EPS algorithm with improved customizability.

Main Methods:

  • The EPS algorithm combines deep learning (variational autoencoder) with logistic regression to generate latent sample representations.
  • It creates pseudo-samples around extreme cases and controls to augment the dataset.
  • Feature significance is determined by training a regression model on the upsampled data.

Main Results:

  • The EPS algorithm effectively identifies significant predictive features in complex biological datasets.
  • The open-source implementation offers enhanced customizability for data preparation, model training, and classification.

Conclusions:

  • The Extreme Pseudo-Sampling algorithm provides a robust solution for feature selection in challenging high-dimensional biological data.
  • The enhanced open-source package facilitates wider adoption and application across diverse biological datasets.