Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Random Variables

Random Variables

A random variable is a single numerical value that indicates the outcome of a procedure. The concept of random variables is fundamental to the probability theory and was introduced by a Russian mathematician, Pafnuty Chebyshev, in the mid-nineteenth century.
Uppercase letters such as X or Y denote a random variable. Lowercase letters like x or y denote the value of a random variable. If X is a random variable, then X is written in words, and x is given as a number.
For example, let X = the...

Randomized Experiments

Randomized Experiments

The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...

Variability: Analysis

Variability: Analysis

Measures of variability are statistical metrics that reveal the dispersion pattern within a dataset. They are pivotal in biostatistics, providing insights into the heterogeneity within health and biological data. Variability signifies the degree to which data points diverge from one another, helping researchers understand the potential range of values and associated uncertainty within the data.
The range is a simple measure of variability, indicating the difference between the highest and...

Multiple Regression

Multiple Regression

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

Weighted Mean

Weighted Mean

While taking the arithmetic, geometric, or harmonic mean of a sample data set, equal importance is assigned to all the data points. However, all the values may not always be equally important in some data sets. An intrinsic bias might make it more important to give more weightage to specific values over others.
For example, consider the number of goals scored in the matches of a tournament. While computing the average number of goals scored in the tournament, it may be more important to...

Multi-input and Multi-variable systems

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence of...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Understanding the determinants of public trust in the health care system in China: an analysis of a cross-sectional survey.

Journal of health services research & policy·2018

Same author

Adverse Childhood Experiences, Epigenetic Measures, and Obesity in Youth.

The Journal of pediatrics·2018

Same author

Regularized Latent Class Model for Joint Analysis of High-Dimensional Longitudinal Biomarkers and a Time-to-Event Outcome.

Biometrics·2018

Same author

LncRNA UCA1 sponges miR-204-5p to promote migration, invasion and epithelial-mesenchymal transition of glioma cells via upregulation of ZEB1.

Pathology, research and practice·2018

Same author

International variations in trust in health care systems.

The International journal of health planning and management·2018

Same author

Toll-like receptor 9 negatively regulates pancreatic islet beta cell growth and function in a mouse model of type 1 diabetes.

Diabetologia·2018

Same journal

Measuring drug similarity using drug-drug interactions.

Quantitative biology (Beijing, China)·2026

Same journal

A feature extraction framework for discovering pan-cancer driver genes based on multi-omics data.

Quantitative biology (Beijing, China)·2026

Same journal

DDI-Transform: A neural network for predicting drug-drug interaction events.

Quantitative biology (Beijing, China)·2026

Same journal

Functional predictability of universal gene circuits in diverse microbial hosts.

Quantitative biology (Beijing, China)·2026

Same journal

SimHOEPI: A resampling simulator for generating single nucleotide polymorphism data with a high-order epistasis model.

Quantitative biology (Beijing, China)·2026

Same journal

Plasma proteome profiling reveals biomarkers of chemotherapy resistance in patients with advanced colorectal cancer.

Quantitative biology (Beijing, China)·2026

See all related articles

Search research articles

Related Experiment Videos

Variable importance-weighted Random Forests.

Yiyi Liu¹, Hongyu Zhao^1,2

¹Department of Biostatistics, School of Public Health, Yale University, New Haven, CT 06511, USA.

Quantitative Biology (Beijing, China)

|July 24, 2018

Summary

This summary is machine-generated.

Variable importance-weighted Random Forests improve prediction accuracy by sampling features based on their importance scores. This method enhances performance in classification and regression tasks, especially with noisy data.

Keywords:

Random Forests classification regression variable importance score

Related Experiment Videos

Area of Science:

Bioinformatics
Machine Learning in Biology

Background:

Random Forests (RF) is a widely used classification and regression algorithm in biological studies.
RF performance declines with a high number of features, leading to methods like feature elimination RF.
Existing methods have limitations in rigid feature selection and increased tree correlations.

Purpose of the Study:

To propose a novel Random Forests approach that enhances prediction accuracy.
To address the performance degradation of standard RF with high dimensionality.
To improve feature selection by incorporating variable importance scores.

Main Methods:

Introduced variable importance-weighted Random Forests (viRF).
viRF samples features based on their importance scores at each node, unlike standard RF's equal probability sampling.
The best split is selected from these importance-weighted sampled features.

Main Results:

viRF demonstrated improved performance over standard RF and feature elimination RF in simulations and real-world data.
The method showed enhanced accuracy in both classification and regression tasks.
Performance gains were observed particularly in the presence of weak signals and high noise.

Conclusions:

Incorporating variable importance into feature selection improves utilization of informative features.
viRF offers better prediction accuracy by balancing informative and less informative features.
An R package 'viRandomForests' is available for public use.