Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a...

Classification of Systems-I

Classification of Systems-I

Linearity is a system property characterized by a direct input-output relationship, combining homogeneity and additivity.
Homogeneity dictates that if an input x(t) is multiplied by a constant c, the output y(t) is multiplied by the same constant. Mathematically, this is expressed as:

Classification of Systems-II

Classification of Systems-II

Continuous-time systems have continuous input and output signals, with time measured continuously. These systems are generally defined by differential or algebraic equations. For instance, in an RC circuit, the relationship between input and output voltage is expressed through a differential equation derived from Ohm's law and the capacitor relation,

Aggregates Classification

Aggregates Classification

Aggregate classification is generally based on its size, petrographic characteristics, weight, and source. Size classification ranges from coarse to fine aggregates, defined by the size of the particles. Coarse aggregates are particles that do not pass through ASTM sieve No. 4, and aggregates that pass through the sieve are fine aggregates.
Petrographic classification groups aggregates based on common mineralogical characteristics. Some of the common mineral groups found in aggregates are...

Multiple Regression

Multiple Regression

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

How Data are Classified: Categorical Data

How Data are Classified: Categorical Data

A variable, usually notated by capital letters such as X and Y, is a characteristic or measurement that can be determined for each member of a population. Data are the actual values of variables. They may be numbers, or they may be words. Datum is a single value.
Data are classified based on whether they are measurable or not. Categorical data cannot be measured; instead, it can be divided into categories. For example, if Y denotes a person's party affiliation, some examples of Y include...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

The impact of maternal anemia and vitamin D deficiency on neonatal outcomes: A retrospective study from a medical center in northern taiwan.

Taiwanese journal of obstetrics & gynecology·2026

Same author

Spleen-tonifying formula alleviates social deficits, gut dysbiosis, and hypomyelination in a perinatal injury model.

Pediatrics and neonatology·2026

Same author

Author Correction: Extracellular vesicles incorporating retrovirus-like capsids for the enhanced packaging and systemic delivery of mRNA into neurons.

Nature biomedical engineering·2026

Same author

Selective Defect Engineering for Gate-Controlled yet Contact-Transparent Bi<sub>2</sub>O<sub>2</sub>Se Transistors.

ACS nano·2026

Same author

Mono(2-ethylhexyl) phthalate modulates bone marrow-derived APCs and exacerbates allergic lung inflammation <i>via</i> PPARγ-dependent pro-inflammatory signaling.

Immunopharmacology and immunotoxicology·2026

Same author

Three-dimensional quantitative tissue clearing reveals differences in osteovascular niche of aged and young human mesenchymal stromal cells.

Nature biomedical engineering·2026

Same journal

DARUMA: a gateway to fast and easy prediction of intrinsically disordered regions.

PeerJ. Computer science·2026

Same journal

Alzheimer's disease detection using a quantum deep neural network with Haralick feature extraction and simulated annealing optimization.

PeerJ. Computer science·2026

Same journal

Network anomaly detection using Deep Autoencoder and parallel Artificial Bee Colony algorithm-trained neural network.

PeerJ. Computer science·2026

Same journal

An anomaly detection model for multivariate time series with anomaly perception.

PeerJ. Computer science·2026

Same journal

Retraction: A wormhole attack detection method for tactical wireless sensor networks.

PeerJ. Computer science·2026

Same journal

Evaluation of mental disorder with prioritization of its type by utilizing the bipolar complex fuzzy decision-making approach based on Schweizer-Sklar prioritized aggregation operators.

PeerJ. Computer science·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 21, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Missing data imputation using classification and regression trees.

Cheng-Yang Chen¹, Yu-Wei Chang¹

¹Department of Statistics, National Chengchi University, Taipei, Taiwan.

Peerj. Computer Science

|July 10, 2024

Summary

This summary is machine-generated.

Classification and Regression Trees (CART) imputation methods vary in accuracy. The best approach for missing data depends on variable type and correlation, with specific recommendations for ordinal and quantitative variables under different missingness assumptions.

Keywords:

Classification and regression trees Missing data Missing data imputation Resampling

More Related Videos

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

Related Experiment Videos

Last Updated: Jun 21, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

Area of Science:

Statistics
Data Science
Machine Learning

Background:

Missing data is a prevalent issue in real-world data analysis.
Data imputation is a common technique to handle missing values.
Classification and Regression Trees (CART) are frequently used for imputation.

Purpose of the Study:

To explore a novel perspective on CART-based missing data imputation.
To compare the performance of existing CART imputation methods.
To identify imputation strategies with superior accuracy across diverse conditions.

Main Methods:

Utilizing resampling algorithms for a new perspective on CART imputation.
Conducting simulation studies to compare various CART imputation techniques.
Applying selected imputation methods to real-world datasets (Hepatitis, Credit Approval).

Main Results:

The optimal imputation method is contingent upon variable correlation.
For ordinal variables, `rpart` with surrogate variables is recommended for MCAR/MAR data (correlation > 0).
Chi-squared tests and `rpart` with surrogate variables are suggested for MNAR data; iterative imputation is best for quantitative variables with moderate correlation.

Conclusions:

The choice of CART imputation method requires careful consideration of data characteristics.
Variable correlation and missing data mechanisms (MCAR, MAR, MNAR) significantly influence imputation performance.
Specific CART-based strategies offer improved accuracy for different data types and missingness scenarios.