Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Comparing the Survival Analysis of Two or More Groups01:20

Comparing the Survival Analysis of Two or More Groups

637
Survival analysis is a cornerstone of medical research, used to evaluate the time until an event of interest occurs, such as death, disease recurrence, or recovery. Unlike standard statistical methods, survival analysis is particularly adept at handling censored data—instances where the event has not occurred for some participants by the end of the study or remains unobserved. To address these unique challenges, specialized techniques like the Kaplan-Meier estimator, log-rank test, and...
637
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

4.2K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
4.2K
Variability: Analysis01:11

Variability: Analysis

547
Measures of variability are statistical metrics that reveal the dispersion pattern within a dataset. They are pivotal in biostatistics, providing insights into the heterogeneity within health and biological data. Variability signifies the degree to which data points diverge from one another, helping researchers understand the potential range of values and associated uncertainty within the data.
The range is a simple measure of variability, indicating the difference between the highest and...
547
Survival Tree01:19

Survival Tree

443
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
443
Biostatistics: Overview01:20

Biostatistics: Overview

924
Biostatistics plays a crucial role in understanding and analyzing data in healthcare and biology. Biostatisticians conduct experiments, gather evidence, and draw meaningful conclusions using statistical methods and techniques. Different variables form the foundation of biostatistical analysis, allowing researchers to understand and interpret data effectively. These variables are classified into different types, each serving a specific purpose in statistical analysis.
Discrete variables are...
924
Randomized Experiments01:13

Randomized Experiments

9.1K
The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...
9.1K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Understanding complex analytical data by a supervised correlation coefficient obtained from random forest.

Analytical and bioanalytical chemistry·2026
Same author

Educational inequalities are associated with distinct metabolomic and gut microbiome patterns in adults.

Social science & medicine (1982)·2026
Same author

Impact of single freeze-thaw cycles on human serum proteins: Implications for mass spectrometry biomarker validation.

iScience·2026
Same author

ShadowVIMP: permutation-based multiple testing-controlled variable selection.

BMC bioinformatics·2026
Same author

Calling for Diversity: Improving Transfusion Safety Through High-Throughput Blood Group Microarray Genotyping.

Genomics, proteomics & bioinformatics·2026
Same author

Endolysosomal Impact of Elevated Ceramide Levels Revealed by Optical and Ultrastructural Nanoprobing.

ACS nano·2026
Same journal

STED: flexible cross-modal topic modeling infers cell-type-specific regulatory landscapes from bulk epigenomics.

Briefings in bioinformatics·2026
Same journal

A knowledge-guided deep learning framework for quantitative nucleic acid testing.

Briefings in bioinformatics·2026
Same journal

Optimal transport for label transfer in single-cell multi-omics integration.

Briefings in bioinformatics·2026
Same journal

Continuous multi-omics pathway enrichment analysis resolves hidden functional heterogeneity.

Briefings in bioinformatics·2026
Same journal

Evaluating completeness, coherence, and consistency of genome-scale function annotations.

Briefings in bioinformatics·2026
Same journal

Transformers for single-cell RNA sequencing: a survey.

Briefings in bioinformatics·2026
See all related articles

Related Experiment Video

Updated: Feb 20, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

8.1K

Evaluation of variable selection methods for random forests and omics data sets.

Frauke Degenhardt1, Stephan Seifert2, Silke Szymczak2

  • 1Institute of Clinical Molecular Biology, Kiel University, Germany.

Briefings in Bioinformatics
|October 19, 2017
PubMed
Summary
This summary is machine-generated.

For high-dimensional omics data, the Boruta and Vita variable selection methods are recommended. Vita is faster for large datasets, while Boruta is suitable for low-dimensional settings.

Keywords:
feature selectionhigh dimensional datamachine learningrandom forestrelevant variables

More Related Videos

Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts
08:51

Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts

Published on: September 20, 2024

2.2K
Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model
07:13

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Published on: April 18, 2025

774

Related Experiment Videos

Last Updated: Feb 20, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

8.1K
Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts
08:51

Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts

Published on: September 20, 2024

2.2K
Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model
07:13

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Published on: April 18, 2025

774

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • Machine learning, particularly random forests, shows promise for analyzing high-dimensional omics data.
  • Variable importance measures are crucial for ranking predictors in omics studies.
  • Selecting relevant variables is key for identifying active biological networks and pathways.

Purpose of the Study:

  • To evaluate and compare the performance of various variable selection procedures for high-dimensional omics data.
  • To identify the most effective methods for variable selection in both simulated and experimental datasets.
  • To provide recommendations for choosing appropriate variable selection techniques based on study objectives and data characteristics.

Main Methods:

  • Comparison of Boruta algorithm, Vita method, recurrent relative variable importance, permutation approach (Altmann), and recursive feature elimination (RFE).
  • Evaluation using simulated datasets to assess power and stability.
  • Analysis of publicly available experimental methylation and gene expression data.

Main Results:

  • Boruta demonstrated the highest power in simulation studies, closely followed by Vita.
  • Both Boruta and Vita showed similar stability in variable selection.
  • Vita proved most robust under a null model and was more computationally efficient on experimental data.
  • Vita offered slightly better stability and was faster than Boruta for experimental datasets.

Conclusions:

  • Boruta and Vita are recommended for high-dimensional data analysis.
  • Vita is preferable for large datasets due to its speed and efficiency.
  • Boruta is suitable for low-dimensional settings and offers high power.
  • The choice between Boruta and Vita depends on dataset size and specific analytical goals.