Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Prediction Intervals

Prediction Intervals

The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y.

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a...

Truncation in Survival Analysis

Truncation in Survival Analysis

Truncation in survival analysis refers to the exclusion of individuals or events from the dataset based on specific criteria related to the time of the event. This exclusion can happen in two primary forms: left truncation and right truncation.
Left truncation occurs when individuals who experienced the event of interest before a certain time are not included in the study. This is often due to a "delayed entry" into the study where only those who survive until a certain entry point are...

Regression Toward the Mean

Regression Toward the Mean

Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...

Estimating Population Mean with Unknown Standard Deviation

Estimating Population Mean with Unknown Standard Deviation

In practice, we rarely know the population standard deviation. In the past, when the sample size was large, this did not present a problem to statisticians. They used the sample standard deviation s as an estimate for σ and proceeded as before to calculate a confidence interval with close enough results. However, statisticians ran into problems when the sample size was small. A small sample size caused inaccuracies in the confidence interval.
William S. Gosset (1876–1937) of the...

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Missing infrastructure for real-world predictive AI impact.

BMJ health & care informatics·2026

Same author

Using routinely collected data for research purposes: challenges and mitigation strategies.

BMJ (Clinical research ed.)·2026

Same author

Comparing methods for handling missing data in electronic health records for dynamic risk prediction of central-line associated bloodstream infection.

BMC medical research methodology·2026

Same author

Clustered flexible calibration plots for binary outcomes using random effects modeling.

Research synthesis methods·2026

Same author

Performance of the Dutch Triage standard in managing fever in children in out-of-hours primary care: a secondary analysis of the chili study.

Family practice·2026

Same author

Terms, definitions and measurements to describe the sonographic features of adnexal tumors: updated consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group.

Ultrasound in obstetrics & gynecology : the official journal of the International Society of Ultrasound in Obstetrics and Gynecology·2026

Same journal

Thymidylate synthase inhibitory drugs induce p53-dependent pathways differently.

PloS one·2026

Same journal

Top-down and bottom-up attention for joint pattern classification and reconstruction.

PloS one·2026

Same journal

Short- and long-term scaling behavior of blood pressure and pulse arrival time during sleep in healthy controls and patients with obstructive sleep apnea.

PloS one·2026

Same journal

Double DQN-based secrecy energy efficiency and fairness performance in IRS-assisted NOMA systems with friendly jamming.

PloS one·2026

Same journal

10 recommendations for strengthening citizen science for improved societal and ecological outcomes: A co-produced analysis of challenges and opportunities in the 21st century.

PloS one·2026

Same journal

Paying in public: Peer effects, impression management, and willingness to pay on digital payment platforms.

PloS one·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 12, 2026

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

missForestPredict-Missing data imputation for prediction settings.

Elena Albu¹, Shan Gao¹, Laure Wynants^1,2,3

¹Department of Development & Regeneration, KU Leuven, Leuven, Belgium.

|November 7, 2025

Summary

This summary is machine-generated.

The missForestPredict R package offers a fast and user-friendly way to handle missing data in prediction models. It provides competitive imputation results with short computation times, improving prediction accuracy.

More Related Videos

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data

A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data

Published on: December 9, 2015

Related Experiment Videos

Last Updated: Jan 12, 2026

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data

A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data

Published on: December 9, 2015

Area of Science:

Machine Learning
Statistical Modeling
Data Science

Background:

Missing data is a common challenge in developing and applying prediction models.
Existing imputation methods may not be optimal for prediction settings or can be computationally intensive.

Purpose of the Study:

To introduce the missForestPredict R package, an adaptation of the missForest algorithm optimized for prediction tasks.
To provide a fast, user-friendly, and flexible imputation tool for handling missing data in prediction models.

Main Methods:

The missForestPredict algorithm uses iterative random forest imputation, with a unified convergence criterion for continuous and categorical variables.
Imputation models are saved for later application to new data, and the package offers enhanced error monitoring and customization options.
Performance was evaluated against other imputation methods on simulated and real-world datasets using various prediction models.

Main Results:

missForestPredict demonstrated competitive prediction performance across diverse datasets and missingness scenarios.
The algorithm achieved these results within significantly shorter computation times compared to several other methods.
The package's features allow for tailored imputation strategies, enhancing its applicability.

Conclusions:

missForestPredict is an effective and efficient tool for handling missing data in prediction modeling.
Its speed, user-friendliness, and flexibility make it a valuable addition for data scientists and researchers.
The package facilitates improved prediction model development and deployment where missing data is present.