Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a...

Wald-Wolfowitz Runs Test I

Wald-Wolfowitz Runs Test I

The Wald-Wolfowitz test, also known as the runs test, is a nonparametric statistical test used to assess the randomness of a sequence of two different types of elements (e.g., positive/negative values, successes/failures). It examines whether the order of the elements in a sequence is random or if there is a pattern or trend present. This nonparametric test applies to any ordered data despite the population and sample data distribution, even if a higher sample size is available.
The test works...

Randomized Experiments

Randomized Experiments

The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...

Random Sampling Method

Random Sampling Method

Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest. Among the various sampling methods used by...

Random Variables

Random Variables

A random variable is a single numerical value that indicates the outcome of a procedure. The concept of random variables is fundamental to the probability theory and was introduced by a Russian mathematician, Pafnuty Chebyshev, in the mid-nineteenth century.
Uppercase letters such as X or Y denote a random variable. Lowercase letters like x or y denote the value of a random variable. If X is a random variable, then X is written in words, and x is given as a number.
For example, let X = the...

Stratified Sampling Method

Stratified Sampling Method

Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a stratified sample, divide the population into groups called strata and then take a...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Super greedy trees.

Artificial intelligence review·2026

Same author

Variable Priority for Unsupervised Variable Selection.

Pattern recognition·2026

Same author

A Short Dietary Screener Captures Food Items and Dietary Patterns That Associate With Inflammation in Inflammatory Bowel Disease.

Crohn's & colitis 360·2025

Same author

Individual variable priority: a model-independent local gradient method for variable importance.

Artificial intelligence review·2025

Same author

Tutorial on Conditional Simulations With a Tumor Size-Overall Survival Model to Support Oncology Drug Development.

CPT: pharmacometrics & systems pharmacology·2025

Same author

Development of a Joint Tumor Size-Overall Survival Modeling and Simulation Framework Supporting Oncology Development Decision-Making.

CPT: pharmacometrics & systems pharmacology·2025

Same journal

Extracting Genetically-Imputed Causal Features From ECG Data.

Statistical analysis and data mining·2026

Same journal

Triangulation-Based Spatial Clustering for Adjacent Data With Heterogeneous Density.

Statistical analysis and data mining·2026

Same journal

Bayesian Posterior Interval Calibration to Improve the Interpretability of Observational Studies.

Statistical analysis and data mining·2025

Same journal

A treeless absolutely random forest with closed-form estimators of expected proximities.

Statistical analysis and data mining·2024

Same journal

Data-driven Stochastic Model for Quantifying the Interplay Between Amyloid-beta and Calcium Levels in Alzheimer's Disease.

Statistical analysis and data mining·2024

Same journal

A tree-based gene-environment interaction analysis with rare features.

Statistical analysis and data mining·2023

See all related articles

Search research articles

Related Experiment Video

Updated: Nov 9, 2025

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Unsupervised random forests.

Alejandro Mantero¹, Hemant Ishwaran¹

¹Division of Biostatistics, University of Miami, Miami, Florida, USA.

Statistical Analysis and Data Mining

|April 9, 2021

Summary

This summary is machine-generated.

sidClustering is a novel random forests algorithm for unsupervised machine learning. This method effectively identifies clusters using both categorical and continuous variables, retaining key random forest advantages.

Keywords:

Impurity sidClustering staggered interaction data unsupervised learning

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Published on: April 18, 2025

Related Experiment Videos

Last Updated: Nov 9, 2025

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Published on: April 18, 2025

Area of Science:

Machine Learning
Bioinformatics
Data Mining

Background:

Unsupervised machine learning algorithms are crucial for identifying patterns in complex datasets.
Existing methods may struggle with mixed data types (categorical and continuous).

Purpose of the Study:

Introduce sidClustering, a new random forests-based unsupervised machine learning algorithm.
Demonstrate its effectiveness in identifying clusters from mixed data types.

Main Methods:

sidClustering employs feature sidification to create mutually exclusive ranges and interaction features.
A multivariate random forest model predicts these sidified features.
Multivariate impurity splitting is utilized for cluster identification.

Main Results:

The sidification process is unique and reproducible.
sidClustering successfully identifies clusters arising from both categorical and continuous variables.
The algorithm retains the advantages of traditional random forests.

Conclusions:

sidClustering offers a robust approach for unsupervised clustering with mixed data types.
The method is validated on simulated and real-world datasets, including cancer and cardiovascular patient data.