Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Survival Tree01:19

Survival Tree

79
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
79
Classification of Systems-I01:26

Classification of Systems-I

179
Linearity is a system property characterized by a direct input-output relationship, combining homogeneity and additivity.
Homogeneity dictates that if an input x(t) is multiplied by a constant c, the output y(t) is multiplied by the same constant. Mathematically, this is expressed as:
179
Classification of Systems-II01:31

Classification of Systems-II

139
Continuous-time systems have continuous input and output signals, with time measured continuously. These systems are generally defined by differential or algebraic equations. For instance, in an RC circuit, the relationship between input and output voltage is expressed through a differential equation derived from Ohm's law and the capacitor relation,
139
Aggregates Classification01:29

Aggregates Classification

314
Aggregate classification is generally based on its size, petrographic characteristics, weight, and source. Size classification ranges from coarse to fine aggregates, defined by the size of the particles. Coarse aggregates are particles that do not pass through ASTM sieve No. 4, and aggregates that pass through the sieve are fine aggregates.
Petrographic classification groups aggregates based on common mineralogical characteristics. Some of the common mineral groups found in aggregates are...
314
Multiple Regression01:25

Multiple Regression

3.0K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
3.0K
How Data are Classified: Categorical Data01:11

How Data are Classified: Categorical Data

32.3K
A variable, usually notated by capital letters such as X and Y, is a characteristic or measurement that can be determined for each member of a population. Data are the actual values of variables. They may be numbers, or they may be words. Datum is a single value.
Data are classified based on whether they are measurable or not. Categorical data cannot be measured; instead, it can be divided into categories. For example, if Y denotes a person's party affiliation, some examples of Y include...
32.3K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

The impact of maternal anemia and vitamin D deficiency on neonatal outcomes: A retrospective study from a medical center in northern taiwan.

Taiwanese journal of obstetrics & gynecology·2026
Same author

Spleen-tonifying formula alleviates social deficits, gut dysbiosis, and hypomyelination in a perinatal injury model.

Pediatrics and neonatology·2026
Same author

Author Correction: Extracellular vesicles incorporating retrovirus-like capsids for the enhanced packaging and systemic delivery of mRNA into neurons.

Nature biomedical engineering·2026
Same author

Selective Defect Engineering for Gate-Controlled yet Contact-Transparent Bi<sub>2</sub>O<sub>2</sub>Se Transistors.

ACS nano·2026
Same author

Mono(2-ethylhexyl) phthalate modulates bone marrow-derived APCs and exacerbates allergic lung inflammation <i>via</i> PPARγ-dependent pro-inflammatory signaling.

Immunopharmacology and immunotoxicology·2026
Same author

Three-dimensional quantitative tissue clearing reveals differences in osteovascular niche of aged and young human mesenchymal stromal cells.

Nature biomedical engineering·2026
Same journal

DARUMA: a gateway to fast and easy prediction of intrinsically disordered regions.

PeerJ. Computer science·2026
Same journal

Alzheimer's disease detection using a quantum deep neural network with Haralick feature extraction and simulated annealing optimization.

PeerJ. Computer science·2026
Same journal

Network anomaly detection using Deep Autoencoder and parallel Artificial Bee Colony algorithm-trained neural network.

PeerJ. Computer science·2026
Same journal

An anomaly detection model for multivariate time series with anomaly perception.

PeerJ. Computer science·2026
Same journal

Retraction: A wormhole attack detection method for tactical wireless sensor networks.

PeerJ. Computer science·2026
Same journal

Evaluation of mental disorder with prioritization of its type by utilizing the bipolar complex fuzzy decision-making approach based on Schweizer-Sklar prioritized aggregation operators.

PeerJ. Computer science·2026
See all related articles

Related Experiment Video

Updated: Jun 21, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.5K

Missing data imputation using classification and regression trees.

Cheng-Yang Chen1, Yu-Wei Chang1

  • 1Department of Statistics, National Chengchi University, Taipei, Taiwan.

Peerj. Computer Science
|July 10, 2024
PubMed
Summary
This summary is machine-generated.

Classification and Regression Trees (CART) imputation methods vary in accuracy. The best approach for missing data depends on variable type and correlation, with specific recommendations for ordinal and quantitative variables under different missingness assumptions.

Keywords:
Classification and regression treesMissing dataMissing data imputationResampling

More Related Videos

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.3K
Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma
04:09

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

8.2K

Related Experiment Videos

Last Updated: Jun 21, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.5K
Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.3K
Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma
04:09

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

8.2K

Area of Science:

  • Statistics
  • Data Science
  • Machine Learning

Background:

  • Missing data is a prevalent issue in real-world data analysis.
  • Data imputation is a common technique to handle missing values.
  • Classification and Regression Trees (CART) are frequently used for imputation.

Purpose of the Study:

  • To explore a novel perspective on CART-based missing data imputation.
  • To compare the performance of existing CART imputation methods.
  • To identify imputation strategies with superior accuracy across diverse conditions.

Main Methods:

  • Utilizing resampling algorithms for a new perspective on CART imputation.
  • Conducting simulation studies to compare various CART imputation techniques.
  • Applying selected imputation methods to real-world datasets (Hepatitis, Credit Approval).

Main Results:

  • The optimal imputation method is contingent upon variable correlation.
  • For ordinal variables, `rpart` with surrogate variables is recommended for MCAR/MAR data (correlation > 0).
  • Chi-squared tests and `rpart` with surrogate variables are suggested for MNAR data; iterative imputation is best for quantitative variables with moderate correlation.

Conclusions:

  • The choice of CART imputation method requires careful consideration of data characteristics.
  • Variable correlation and missing data mechanisms (MCAR, MAR, MNAR) significantly influence imputation performance.
  • Specific CART-based strategies offer improved accuracy for different data types and missingness scenarios.