Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Survival Tree01:19

Survival Tree

210
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
210
Wald-Wolfowitz Runs Test I01:17

Wald-Wolfowitz Runs Test I

790
The Wald-Wolfowitz test, also known as the runs test, is a nonparametric statistical test used to assess the randomness of a sequence of two different types of elements (e.g., positive/negative values, successes/failures). It examines whether the order of the elements in a sequence is random or if there is a pattern or trend present. This nonparametric test applies to any ordered data despite the population and sample data distribution, even if a higher sample size is available.
The test works...
790
Randomized Experiments01:13

Randomized Experiments

8.5K
The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...
8.5K
Random Sampling Method01:09

Random Sampling Method

13.5K
Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest. Among the various sampling methods used by...
13.5K
Random Variables01:09

Random Variables

16.3K
A random variable is a single numerical value that indicates the outcome of a procedure. The concept of random variables is fundamental to the probability theory and was introduced by a Russian mathematician, Pafnuty Chebyshev, in the mid-nineteenth century.
Uppercase letters such as X or Y denote a random variable. Lowercase letters like x or y denote the value of a random variable. If X is a random variable, then X is written in words, and x is given as a number.
For example, let X = the...
16.3K
Stratified Sampling Method01:16

Stratified Sampling Method

13.9K
Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a stratified sample, divide the population into groups called strata and then take a...
13.9K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Super greedy trees.

Artificial intelligence review·2026
Same author

Variable Priority for Unsupervised Variable Selection.

Pattern recognition·2026
Same author

A Short Dietary Screener Captures Food Items and Dietary Patterns That Associate With Inflammation in Inflammatory Bowel Disease.

Crohn's & colitis 360·2025
Same author

Individual variable priority: a model-independent local gradient method for variable importance.

Artificial intelligence review·2025
Same author

Tutorial on Conditional Simulations With a Tumor Size-Overall Survival Model to Support Oncology Drug Development.

CPT: pharmacometrics & systems pharmacology·2025
Same author

Development of a Joint Tumor Size-Overall Survival Modeling and Simulation Framework Supporting Oncology Development Decision-Making.

CPT: pharmacometrics & systems pharmacology·2025
Same journal

Extracting Genetically-Imputed Causal Features From ECG Data.

Statistical analysis and data mining·2026
Same journal

Triangulation-Based Spatial Clustering for Adjacent Data With Heterogeneous Density.

Statistical analysis and data mining·2026
Same journal

Bayesian Posterior Interval Calibration to Improve the Interpretability of Observational Studies.

Statistical analysis and data mining·2025
Same journal

A treeless absolutely random forest with closed-form estimators of expected proximities.

Statistical analysis and data mining·2024
Same journal

Data-driven Stochastic Model for Quantifying the Interplay Between Amyloid-beta and Calcium Levels in Alzheimer's Disease.

Statistical analysis and data mining·2024
Same journal

A tree-based gene-environment interaction analysis with rare features.

Statistical analysis and data mining·2023
See all related articles

Related Experiment Video

Updated: Nov 9, 2025

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.5K

Unsupervised random forests.

Alejandro Mantero1, Hemant Ishwaran1

  • 1Division of Biostatistics, University of Miami, Miami, Florida, USA.

Statistical Analysis and Data Mining
|April 9, 2021
PubMed
Summary
This summary is machine-generated.

sidClustering is a novel random forests algorithm for unsupervised machine learning. This method effectively identifies clusters using both categorical and continuous variables, retaining key random forest advantages.

Keywords:
ImpuritysidClusteringstaggered interaction dataunsupervised learning

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.8K
Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model
07:13

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Published on: April 18, 2025

311

Related Experiment Videos

Last Updated: Nov 9, 2025

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.5K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.8K
Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model
07:13

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Published on: April 18, 2025

311

Area of Science:

  • Machine Learning
  • Bioinformatics
  • Data Mining

Background:

  • Unsupervised machine learning algorithms are crucial for identifying patterns in complex datasets.
  • Existing methods may struggle with mixed data types (categorical and continuous).

Purpose of the Study:

  • Introduce sidClustering, a new random forests-based unsupervised machine learning algorithm.
  • Demonstrate its effectiveness in identifying clusters from mixed data types.

Main Methods:

  • sidClustering employs feature sidification to create mutually exclusive ranges and interaction features.
  • A multivariate random forest model predicts these sidified features.
  • Multivariate impurity splitting is utilized for cluster identification.

Main Results:

  • The sidification process is unique and reproducible.
  • sidClustering successfully identifies clusters arising from both categorical and continuous variables.
  • The algorithm retains the advantages of traditional random forests.

Conclusions:

  • sidClustering offers a robust approach for unsupervised clustering with mixed data types.
  • The method is validated on simulated and real-world datasets, including cancer and cardiovascular patient data.