Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Prediction Intervals01:03

Prediction Intervals

3.1K
The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y. 
3.1K
Survival Tree01:19

Survival Tree

375
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
375
Truncation in Survival Analysis01:09

Truncation in Survival Analysis

563
Truncation in survival analysis refers to the exclusion of individuals or events from the dataset based on specific criteria related to the time of the event. This exclusion can happen in two primary forms: left truncation and right truncation.
Left truncation occurs when individuals who experienced the event of interest before a certain time are not included in the study. This is often due to a "delayed entry" into the study where only those who survive until a certain entry point are...
563
Regression Toward the Mean01:52

Regression Toward the Mean

6.8K
Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...
6.8K
Estimating Population Mean with Unknown Standard Deviation01:22

Estimating Population Mean with Unknown Standard Deviation

8.7K
In practice, we rarely know the population standard deviation. In the past, when the sample size was large, this did not present a problem to statisticians. They used the sample standard deviation s as an estimate for σ and proceeded as before to calculate a confidence interval with close enough results. However, statisticians ran into problems when the sample size was small. A small sample size caused inaccuracies in the confidence interval.
William S. Gosset (1876–1937) of the...
8.7K
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

7.1K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
7.1K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Missing infrastructure for real-world predictive AI impact.

BMJ health & care informatics·2026
Same author

Using routinely collected data for research purposes: challenges and mitigation strategies.

BMJ (Clinical research ed.)·2026
Same author

Comparing methods for handling missing data in electronic health records for dynamic risk prediction of central-line associated bloodstream infection.

BMC medical research methodology·2026
Same author

Clustered flexible calibration plots for binary outcomes using random effects modeling.

Research synthesis methods·2026
Same author

Performance of the Dutch Triage standard in managing fever in children in out-of-hours primary care: a secondary analysis of the chili study.

Family practice·2026
Same author

Terms, definitions and measurements to describe the sonographic features of adnexal tumors: updated consensus opinion from the International Ovarian Tumor Analysis (IOTA) Group.

Ultrasound in obstetrics & gynecology : the official journal of the International Society of Ultrasound in Obstetrics and Gynecology·2026
Same journal

Thymidylate synthase inhibitory drugs induce p53-dependent pathways differently.

PloS one·2026
Same journal

Top-down and bottom-up attention for joint pattern classification and reconstruction.

PloS one·2026
Same journal

Short- and long-term scaling behavior of blood pressure and pulse arrival time during sleep in healthy controls and patients with obstructive sleep apnea.

PloS one·2026
Same journal

Double DQN-based secrecy energy efficiency and fairness performance in IRS-assisted NOMA systems with friendly jamming.

PloS one·2026
Same journal

10 recommendations for strengthening citizen science for improved societal and ecological outcomes: A co-produced analysis of challenges and opportunities in the 21st century.

PloS one·2026
Same journal

Paying in public: Peer effects, impression management, and willingness to pay on digital payment platforms.

PloS one·2026
See all related articles

Related Experiment Video

Updated: Jan 12, 2026

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.9K

missForestPredict-Missing data imputation for prediction settings.

Elena Albu1, Shan Gao1, Laure Wynants1,2,3

  • 1Department of Development & Regeneration, KU Leuven, Leuven, Belgium.

Plos One
|November 7, 2025
PubMed
Summary
This summary is machine-generated.

The missForestPredict R package offers a fast and user-friendly way to handle missing data in prediction models. It provides competitive imputation results with short computation times, improving prediction accuracy.

More Related Videos

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.7K
A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data
10:46

A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data

Published on: December 9, 2015

11.0K

Related Experiment Videos

Last Updated: Jan 12, 2026

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.9K
Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.7K
A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data
10:46

A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data

Published on: December 9, 2015

11.0K

Area of Science:

  • Machine Learning
  • Statistical Modeling
  • Data Science

Background:

  • Missing data is a common challenge in developing and applying prediction models.
  • Existing imputation methods may not be optimal for prediction settings or can be computationally intensive.

Purpose of the Study:

  • To introduce the missForestPredict R package, an adaptation of the missForest algorithm optimized for prediction tasks.
  • To provide a fast, user-friendly, and flexible imputation tool for handling missing data in prediction models.

Main Methods:

  • The missForestPredict algorithm uses iterative random forest imputation, with a unified convergence criterion for continuous and categorical variables.
  • Imputation models are saved for later application to new data, and the package offers enhanced error monitoring and customization options.
  • Performance was evaluated against other imputation methods on simulated and real-world datasets using various prediction models.

Main Results:

  • missForestPredict demonstrated competitive prediction performance across diverse datasets and missingness scenarios.
  • The algorithm achieved these results within significantly shorter computation times compared to several other methods.
  • The package's features allow for tailored imputation strategies, enhancing its applicability.

Conclusions:

  • missForestPredict is an effective and efficient tool for handling missing data in prediction modeling.
  • Its speed, user-friendliness, and flexibility make it a valuable addition for data scientists and researchers.
  • The package facilitates improved prediction model development and deployment where missing data is present.