Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Survival Tree01:19

Survival Tree

500
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
500
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

5.0K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
5.0K
Multiple Regression01:25

Multiple Regression

4.4K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
4.4K
Wald-Wolfowitz Runs Test I01:17

Wald-Wolfowitz Runs Test I

1.1K
The Wald-Wolfowitz test, also known as the runs test, is a nonparametric statistical test used to assess the randomness of a sequence of two different types of elements (e.g., positive/negative values, successes/failures). It examines whether the order of the elements in a sequence is random or if there is a pattern or trend present. This nonparametric test applies to any ordered data despite the population and sample data distribution, even if a higher sample size is available.
The test works...
1.1K
One-Way ANOVA: Unequal Sample Sizes01:15

One-Way ANOVA: Unequal Sample Sizes

7.1K
One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:
7.1K
Outliers and Influential Points01:08

Outliers and Influential Points

6.8K
An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...
6.8K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Bacillus Subtilis THC1I Restores Intestinal Barrier Integrity and Gut Microbiota Balance in a Mouse Model of Antibiotic-Associated Diarrhea.

Current microbiology·2026
Same author

A common signature of brain metastases among patients with breast cancer, melanoma, and lymphoma.

Neuro-oncology advances·2026
Same author

GC-MS and LC-MS/MS profiling, antibacterial activity against <i>Staphylococcus aureus</i>, and docking study of <i>Alpinia calcicola</i> Q.B.Nguyen & M.F.Newman.

Natural product research·2026
Same author

Occurrence of <i>Toxoplasma gondii</i> and <i>Neospora caninum</i> Antibodies in Pet Cats and Dogs in Pathum Thani, Thailand.

Tropical medicine and infectious disease·2026
Same author

FreqMLNet: Non-transformer network with frequency domain reconstruction and multi-scale representation for time series forecasting.

Neural networks : the official journal of the International Neural Network Society·2026
Same author

Deep learning-based framework for Mycobacterium tuberculosis bacterial growth detection for antimicrobial susceptibility testing.

Computational and structural biotechnology journal·2025
Same journal

Opportunities and Challenges of Integrating Ethiopian Traditional Medicine System Into Modern Medicine: A Narrative Review.

TheScientificWorldJournal·2026
Same journal

Exploring the Antiparasitic Activity of the Sea Cucumber Isostichopus sp. aff. badionotus From the Northern Coast of Colombia Against Trypanosoma cruzi.

TheScientificWorldJournal·2026
Same journal

Kalanchoe ceratophylla (Crassulaceae): The True Identity of Sidingin, a Medicinal Plant From Sumatra, Based on Morphological and Molecular Evidence.

TheScientificWorldJournal·2026
Same journal

Genetic Variation of Chicken Growth Differentiation Factor-9 Gene and Association With Egg Characteristics: A Systematic Review.

TheScientificWorldJournal·2026
Same journal

Applied Research on the Effect of Risks on Public Health Building Projects' Performance: Empirical Results From Tanzania.

TheScientificWorldJournal·2026
Same journal

Projected Impacts of Climate and Land Use/Land Cover Change on Sediment Yield and Surface Runoff in the Baro River Sub-Basin, Ethiopia.

TheScientificWorldJournal·2026
See all related articles

Related Experiment Video

Updated: Apr 14, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

8.1K

Unbiased feature selection in learning random forests for high-dimensional data.

Thanh-Tung Nguyen1, Joshua Zhexue Huang2, Thuy Thi Nguyen3

  • 1Shenzhen Key Laboratory of High Performance Data Mining, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China ; University of Chinese Academy of Sciences, Beijing 100049, China ; School of Computer Science and Engineering, Water Resources University, Hanoi 10000, Vietnam.

Thescientificworldjournal
|April 17, 2015
PubMed
Summary
This summary is machine-generated.

This study introduces xRF, an enhanced random forest (RF) algorithm that improves classification accuracy on high-dimensional data. xRF effectively debiases feature selection, leading to better performance than standard RFs.

More Related Videos

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

1.5K

Related Experiment Videos

Last Updated: Apr 14, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

8.1K
Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

1.5K

Area of Science:

  • Machine Learning
  • Data Science
  • Computer Science

Background:

  • Random Forests (RFs) are popular for classification but struggle with high-dimensional data due to uninformative feature selection.
  • RFs exhibit bias favoring multi-valued features, impacting performance.
  • The randomization in RFs can lead to suboptimal feature selection for node splitting.

Purpose of the Study:

  • To propose a novel RF algorithm, xRF, for improved feature selection in high-dimensional datasets.
  • To enhance the accuracy and debias the feature selection process of RFs.
  • To reduce dimensionality and data requirements for learning RFs.

Main Methods:

  • Uninformative features are identified and removed using p-value assessment.
  • A subset of unbiased features is selected based on statistical measures.
  • Feature weighting sampling is applied to partitioned feature subsets for tree construction.

Main Results:

  • The proposed xRF algorithm demonstrated superior performance over existing RF methods.
  • Experiments on 47 high-dimensional datasets showed increased accuracy and AUC measures.
  • The approach effectively handles high-dimensional data, including image datasets.

Conclusions:

  • xRF offers a robust solution for debiasing feature selection in random forests.
  • The algorithm enhances classification accuracy and AUC for high-dimensional data.
  • xRF provides a method for more efficient learning by reducing dimensionality and data needs.