Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Random Variables01:09

Random Variables

16.7K
A random variable is a single numerical value that indicates the outcome of a procedure. The concept of random variables is fundamental to the probability theory and was introduced by a Russian mathematician, Pafnuty Chebyshev, in the mid-nineteenth century.
Uppercase letters such as X or Y denote a random variable. Lowercase letters like x or y denote the value of a random variable. If X is a random variable, then X is written in words, and x is given as a number.
For example, let X = the...
16.7K
Randomized Experiments01:13

Randomized Experiments

8.6K
The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...
8.6K
Variability: Analysis01:11

Variability: Analysis

299
Measures of variability are statistical metrics that reveal the dispersion pattern within a dataset. They are pivotal in biostatistics, providing insights into the heterogeneity within health and biological data. Variability signifies the degree to which data points diverge from one another, helping researchers understand the potential range of values and associated uncertainty within the data.
The range is a simple measure of variability, indicating the difference between the highest and...
299
Multiple Regression01:25

Multiple Regression

3.5K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
3.5K
Weighted Mean00:57

Weighted Mean

6.0K
While taking the arithmetic, geometric, or harmonic mean of a sample data set, equal importance is assigned to all the data points. However, all the values may not always be equally important in some data sets. An intrinsic bias might make it more important to give more weightage to specific values over others.
For example, consider the number of goals scored in the matches of a tournament. While computing the average number of goals scored in the tournament, it may be more important to...
6.0K
Multi-input and Multi-variable systems01:22

Multi-input and Multi-variable systems

270
Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence of...
270

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Understanding the determinants of public trust in the health care system in China: an analysis of a cross-sectional survey.

Journal of health services research & policy·2018
Same author

Adverse Childhood Experiences, Epigenetic Measures, and Obesity in Youth.

The Journal of pediatrics·2018
Same author

Regularized Latent Class Model for Joint Analysis of High-Dimensional Longitudinal Biomarkers and a Time-to-Event Outcome.

Biometrics·2018
Same author

LncRNA UCA1 sponges miR-204-5p to promote migration, invasion and epithelial-mesenchymal transition of glioma cells via upregulation of ZEB1.

Pathology, research and practice·2018
Same author

International variations in trust in health care systems.

The International journal of health planning and management·2018
Same author

Toll-like receptor 9 negatively regulates pancreatic islet beta cell growth and function in a mouse model of type 1 diabetes.

Diabetologia·2018
Same journal

Measuring drug similarity using drug-drug interactions.

Quantitative biology (Beijing, China)·2026
Same journal

A feature extraction framework for discovering pan-cancer driver genes based on multi-omics data.

Quantitative biology (Beijing, China)·2026
Same journal

DDI-Transform: A neural network for predicting drug-drug interaction events.

Quantitative biology (Beijing, China)·2026
Same journal

Functional predictability of universal gene circuits in diverse microbial hosts.

Quantitative biology (Beijing, China)·2026
Same journal

SimHOEPI: A resampling simulator for generating single nucleotide polymorphism data with a high-order epistasis model.

Quantitative biology (Beijing, China)·2026
Same journal

Plasma proteome profiling reveals biomarkers of chemotherapy resistance in patients with advanced colorectal cancer.

Quantitative biology (Beijing, China)·2026
See all related articles

Related Experiment Videos

Variable importance-weighted Random Forests.

Yiyi Liu1, Hongyu Zhao1,2

  • 1Department of Biostatistics, School of Public Health, Yale University, New Haven, CT 06511, USA.

Quantitative Biology (Beijing, China)
|July 24, 2018
PubMed
Summary
This summary is machine-generated.

Variable importance-weighted Random Forests improve prediction accuracy by sampling features based on their importance scores. This method enhances performance in classification and regression tasks, especially with noisy data.

Keywords:
Random Forestsclassificationregressionvariable importance score

Related Experiment Videos

Area of Science:

  • Bioinformatics
  • Machine Learning in Biology

Background:

  • Random Forests (RF) is a widely used classification and regression algorithm in biological studies.
  • RF performance declines with a high number of features, leading to methods like feature elimination RF.
  • Existing methods have limitations in rigid feature selection and increased tree correlations.

Purpose of the Study:

  • To propose a novel Random Forests approach that enhances prediction accuracy.
  • To address the performance degradation of standard RF with high dimensionality.
  • To improve feature selection by incorporating variable importance scores.

Main Methods:

  • Introduced variable importance-weighted Random Forests (viRF).
  • viRF samples features based on their importance scores at each node, unlike standard RF's equal probability sampling.
  • The best split is selected from these importance-weighted sampled features.

Main Results:

  • viRF demonstrated improved performance over standard RF and feature elimination RF in simulations and real-world data.
  • The method showed enhanced accuracy in both classification and regression tasks.
  • Performance gains were observed particularly in the presence of weak signals and high noise.

Conclusions:

  • Incorporating variable importance into feature selection improves utilization of informative features.
  • viRF offers better prediction accuracy by balancing informative and less informative features.
  • An R package 'viRandomForests' is available for public use.