Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data01:16

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

180
Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...
180
Multiple Regression01:25

Multiple Regression

3.1K
Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
3.1K
Prediction Intervals01:03

Prediction Intervals

2.3K
The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y. 
2.3K
Collisions in Multiple Dimensions: Problem Solving01:06

Collisions in Multiple Dimensions: Problem Solving

4.3K
In multiple dimensions, the conservation of momentum applies in each direction independently. Hence, to solve collisions in multiple dimensions, we should write down the momentum conservation in each direction separately. To help understand collisions in multiple dimensions, consider an example.
A small car of mass 1,200 kg traveling east at 60 km/h collides at an intersection with a truck of mass 3,000 kg traveling due north at 40 km/h. The two vehicles are locked together. What is the...
4.3K
Model Approaches for Pharmacokinetic Data: Distributed Parameter Models01:06

Model Approaches for Pharmacokinetic Data: Distributed Parameter Models

107
Pharmacokinetic models are mathematical constructs that represent and predict the time course of drug concentrations in the body, providing meaningful pharmacokinetic parameters. These models are categorized into compartment, physiological, and distributed parameter models.
The distributed parameter models are specifically designed to account for variations and differences in some drug classes. This model is particularly useful for assessing regional concentrations of anticancer or...
107
Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving01:29

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

89
Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...
89

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Uncovering Hidden Prognostic Patterns in Colorectal Cancer Histology Using Unsupervised Learning: A Computational Pathology Study.

Bioengineering (Basel, Switzerland)·2026
Same author

Impact of the COVID-19 Pandemic on Routine Childhood Vaccination in Oklahoma.

Vaccines·2026
Same author

Enhancing mRNA therapy through iterative delivery.

Science advances·2025
Same author

Evaluation of Ferroptosis as a Biomarker to Predict Treatment Outcomes of Cancer Immunotherapy.

Cancer research communications·2025
Same author

Novel Metabolites as Potential Indicators of Recovery After Large Vessel Occlusion Stroke: A Pilot Study.

Neurology international·2025
Same author

Mutational disparities in colorectal cancers of White Americans, Alabama African Americans, And Oklahoma American Indians.

NPJ precision oncology·2024
Same journal

Elastic functional Cox regression model with shape predictors.

Journal of applied statistics·2026
Same journal

An improved two-stage binary relevance method for multilabel classification.

Journal of applied statistics·2026
Same journal

Classification of multivariate functional data with an application to ADHD fMRI data.

Journal of applied statistics·2026
Same journal

Assessing the performance of longitudinal T-lymphocytes as biomarkers of immune recovery in HIV-infected children with or without TB co-infection.

Journal of applied statistics·2026
Same journal

Sparse long-only Markowitz portfolio optimization.

Journal of applied statistics·2026
Same journal

Homogeneity of multinomial populations when data are classified into a large number of groups.

Journal of applied statistics·2026
See all related articles

Related Experiment Video

Updated: Aug 9, 2025

Constructing and Visualizing Models using Mime-based Machine-learning Framework
06:19

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

98

Handling high-dimensional data with missing values by modern machine learning techniques.

Sixia Chen1, Chao Xu1

  • 1Department of Biostatistics and Epidemiology, The University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA.

Journal of Applied Statistics
|February 23, 2023
PubMed
Summary
This summary is machine-generated.

Modern machine learning, including deep learning (DL) and XGBoost, effectively handles missing data in high-dimensional datasets. These advanced methods balance bias and variance, outperforming traditional techniques in genetic, financial, and geographical studies.

Keywords:
Deep learninghigh-dimensional dataimputationmachine learningmissing data

More Related Videos

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma
04:09

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

8.3K
Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts
08:51

Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts

Published on: September 20, 2024

1.4K

Related Experiment Videos

Last Updated: Aug 9, 2025

Constructing and Visualizing Models using Mime-based Machine-learning Framework
06:19

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

98
Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma
04:09

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

8.3K
Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts
08:51

Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts

Published on: September 20, 2024

1.4K

Area of Science:

  • Statistics
  • Computer Science
  • Bioinformatics

Background:

  • High-dimensional data are prevalent in genetics, finance, and geography.
  • Missing data in these datasets can introduce significant bias.
  • Proper handling of missing data is crucial for accurate analysis.

Purpose of the Study:

  • To explore modern machine learning techniques for handling missing data in high-dimensional settings.
  • To evaluate the performance of penalized regression, tree-based methods, and deep learning (DL).
  • To compare imputation-based, propensity score, and doubly robust estimators.

Main Methods:

  • Utilized penalized regression, tree-based approaches (e.g., XGBoost), and deep learning (DL).
  • Applied imputation-based, propensity score, and doubly robust estimation strategies.
  • Conducted simulation studies and analyzed a real-world dataset.

Main Results:

  • Deep learning (DL) and XGBoost demonstrated superior performance in balancing bias and variance.
  • These advanced methods effectively addressed missing data challenges in high-dimensional analysis.
  • Simulation studies and real-world application confirmed the benefits of DL and XGBoost.

Conclusions:

  • Modern machine learning techniques, particularly DL and XGBoost, offer significant advantages for high-dimensional missing data analysis.
  • These methods provide robust estimation of population means and percentiles.
  • The findings have implications for various fields utilizing big data, including genetics and finance.