Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

How Data are Classified: Categorical Data

How Data are Classified: Categorical Data

A variable, usually notated by capital letters such as X and Y, is a characteristic or measurement that can be determined for each member of a population. Data are the actual values of variables. They may be numbers, or they may be words. Datum is a single value.
Data are classified based on whether they are measurable or not. Categorical data cannot be measured; instead, it can be divided into categories. For example, if Y denotes a person's party affiliation, some examples of Y include...

Random Error

Random Error

Random or indeterminate errors originate from various uncontrollable variables, such as variations in environmental conditions, instrument imperfections, or the inherent variability of the phenomena being measured. Usually, these errors cannot be predicted, estimated, or characterized because their direction and magnitude often vary in magnitude and direction even during consecutive measurements. As a result, they are difficult to eliminate. However, the aggregate effect of these errors can be...

Random Variables

Random Variables

A random variable is a single numerical value that indicates the outcome of a procedure. The concept of random variables is fundamental to the probability theory and was introduced by a Russian mathematician, Pafnuty Chebyshev, in the mid-nineteenth century.
Uppercase letters such as X or Y denote a random variable. Lowercase letters like x or y denote the value of a random variable. If X is a random variable, then X is written in words, and x is given as a number.
For example, let X = the...

Randomized Experiments

Randomized Experiments

The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...

¹H NMR: Complex Splitting

¹H NMR: Complex Splitting

A proton M that is coupled to a proton X results in doublet signals for M. However, NMR-active nuclei can be simultaneously coupled to more than one nonequivalent nucleus. When M is coupled to a second proton A, such as in styrene oxide, each peak in the doublet is split into another doublet.
Splitting diagrams or splitting tree diagrams are routinely used to depict such complex couplings. While drawing splitting diagrams, the splitting with the larger coupling constant is usually applied...

Random and Systematic Errors

Random and Systematic Errors

Scientists always try their best to record measurements with the utmost accuracy and precision. However, sometimes errors do occur. These errors can be random or systematic. Random errors are observed due to the inconsistency or fluctuation in the measurement process, or variations in the quantity itself that is being measured. Such errors fluctuate from being greater than or less than the true value in repeated measurements. Consider a scientist measuring the length of an earthworm using a...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Efficacy of digital therapeutic sinCephalea for personalised nutrition versus control for migraine prevention: A 12-week open-label randomised clinical trial.

Cephalalgia : an international journal of headache·2026

Same author

How many do we miss? - Evaluation of age at onset and family history as selection criteria for genetic testing in Parkinson's disease.

medRxiv : the preprint server for health sciences·2026

Same author

Association of alcohol responsiveness and non-motor symptoms in isolated adult-onset dystonia.

Journal of neurology·2025

Same author

Sex-specific outcomes after transcatheter or surgical treatment of aortic valve stenosis: the DEDICATE-DZHK6 trial.

European heart journal·2025

Same author

Implementation of Regular Lifestyle Counseling During Long-Term Follow-Up Care of Childhood Cancer Survivors: Monocentric Prospective Study.

JMIR cancer·2024

Same author

Genetic associations vary across the spectrum of fasting serum insulin: results from the European IDEFICS/I.Family children's cohort.

Diabetologia·2023

Same journal

Association between intestinal functional disorders and anal fistula: evidence from a retrospective case-control study.

PeerJ·2026

Same journal

Automated recognition of Meso-Cenozoic foraminifera from Senegalese sedimentary deposits using convolutional neural networks.

PeerJ·2026

Same journal

Genome-wide analysis of <i>HSP70</i> gene superfamily in kelp (<i>Saccharina japonica</i>): identification, characterization, and heat stress-responsive expression profiles.

PeerJ·2026

Same journal

Morphological and molecular evidence of the Antarctic sleeper shark <i>Somniosus antarcticus</i> (Somniosidae) in northern Chile.

PeerJ·2026

Same journal

Stroboscopic balance training enhances dynamic stability and postural control in collegiate badminton players: a randomized controlled trial.

PeerJ·2026

Same journal

Frequent exposure to biologics is associated with small intestinal bacterial overgrowth in patients with Crohn's disease: a retrospective case-control study.

PeerJ·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 29, 2026

Methods of Soil Resampling to Monitor Changes in the Chemical Concentrations of Forest Soils

Methods of Soil Resampling to Monitor Changes in the Chemical Concentrations of Forest Soils

Published on: November 25, 2016

Splitting on categorical predictors in random forests.

Marvin N Wright¹, Inke R König²

¹Leibniz Institute for Prevention Research and Epidemiology-BIPS, Bremen, Germany.

|February 13, 2019

Summary

This summary is machine-generated.

Random Forests (RFs) can now efficiently handle nominal predictors by ordering categories, significantly reducing computational complexity. This new heuristic approach, applied a priori, matches standard performance while speeding up analysis for multiclass classification and survival prediction.

Keywords:

Categorical predictors Classification Random forest Survival analysis

More Related Videos

Simulating Impacts of Ice Storms on Forest Ecosystems

Simulating Impacts of Ice Storms on Forest Ecosystems

Published on: June 30, 2020

Collecting and Processing Drone-based Remotely Sensed Data for Use in Forest Recovery Monitoring

Collecting and Processing Drone-based Remotely Sensed Data for Use in Forest Recovery Monitoring

Published on: October 24, 2025

Related Experiment Videos

Last Updated: Jan 29, 2026

Methods of Soil Resampling to Monitor Changes in the Chemical Concentrations of Forest Soils

Methods of Soil Resampling to Monitor Changes in the Chemical Concentrations of Forest Soils

Published on: November 25, 2016

Simulating Impacts of Ice Storms on Forest Ecosystems

Simulating Impacts of Ice Storms on Forest Ecosystems

Published on: June 30, 2020

Collecting and Processing Drone-based Remotely Sensed Data for Use in Forest Recovery Monitoring

Collecting and Processing Drone-based Remotely Sensed Data for Use in Forest Recovery Monitoring

Published on: October 24, 2025

Area of Science:

Machine Learning
Computational Statistics
Data Mining

Background:

Random Forests (RFs) are successful due to minimal data preprocessing requirements.
Standard RF methods struggle with nominal predictors, requiring computationally expensive 2-partitions.
Existing methods for nominal predictors increase complexity and limit category numbers.

Purpose of the Study:

To develop an efficient method for handling nominal predictors in Random Forests.
To reduce computational complexity and overcome category limitations in RFs.
To improve the performance and efficiency of RFs for multiclass classification and survival prediction.

Main Methods:

Proposed a heuristic to order nominal predictor categories using principal component analysis or log-rank scores.
Categories can be ordered in each split or a priori before growing the forest.
Compared the proposed ordering method against standard 2-partitions, dummy coding, and ignoring nominal nature.

Main Results:

Ordering nominal predictor categories a priori is computationally faster than the standard approach.
The proposed method achieves comparable or better prediction performance across various datasets.
The heuristic ordering effectively treats nominal predictors as ordinal, speeding up computation.

Conclusions:

A priori ordering of nominal predictor categories is a computationally efficient and effective enhancement for Random Forests.
This method matches the performance of standard approaches while significantly reducing computational load.
Recommending the a priori ordering approach as the default method for Random Forests.