Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...

Multiple Regression

Multiple Regression

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

Prediction Intervals

Prediction Intervals

The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y.

Collisions in Multiple Dimensions: Problem Solving

Collisions in Multiple Dimensions: Problem Solving

In multiple dimensions, the conservation of momentum applies in each direction independently. Hence, to solve collisions in multiple dimensions, we should write down the momentum conservation in each direction separately. To help understand collisions in multiple dimensions, consider an example.
A small car of mass 1,200 kg traveling east at 60 km/h collides at an intersection with a truck of mass 3,000 kg traveling due north at 40 km/h. The two vehicles are locked together. What is the...

Model Approaches for Pharmacokinetic Data: Distributed Parameter Models

Model Approaches for Pharmacokinetic Data: Distributed Parameter Models

Pharmacokinetic models are mathematical constructs that represent and predict the time course of drug concentrations in the body, providing meaningful pharmacokinetic parameters. These models are categorized into compartment, physiological, and distributed parameter models.
The distributed parameter models are specifically designed to account for variations and differences in some drug classes. This model is particularly useful for assessing regional concentrations of anticancer or...

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Uncovering Hidden Prognostic Patterns in Colorectal Cancer Histology Using Unsupervised Learning: A Computational Pathology Study.

Bioengineering (Basel, Switzerland)·2026

Same author

Impact of the COVID-19 Pandemic on Routine Childhood Vaccination in Oklahoma.

Vaccines·2026

Same author

Enhancing mRNA therapy through iterative delivery.

Science advances·2025

Same author

Evaluation of Ferroptosis as a Biomarker to Predict Treatment Outcomes of Cancer Immunotherapy.

Cancer research communications·2025

Same author

Novel Metabolites as Potential Indicators of Recovery After Large Vessel Occlusion Stroke: A Pilot Study.

Neurology international·2025

Same author

Mutational disparities in colorectal cancers of White Americans, Alabama African Americans, And Oklahoma American Indians.

NPJ precision oncology·2024

Same journal

Elastic functional Cox regression model with shape predictors.

Journal of applied statistics·2026

Same journal

An improved two-stage binary relevance method for multilabel classification.

Journal of applied statistics·2026

Same journal

Classification of multivariate functional data with an application to ADHD fMRI data.

Journal of applied statistics·2026

Same journal

Assessing the performance of longitudinal T-lymphocytes as biomarkers of immune recovery in HIV-infected children with or without TB co-infection.

Journal of applied statistics·2026

Same journal

Sparse long-only Markowitz portfolio optimization.

Journal of applied statistics·2026

Same journal

Homogeneity of multinomial populations when data are classified into a large number of groups.

Journal of applied statistics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 9, 2025

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

Handling high-dimensional data with missing values by modern machine learning techniques.

Sixia Chen¹, Chao Xu¹

¹Department of Biostatistics and Epidemiology, The University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA.

Journal of Applied Statistics

|February 23, 2023

Summary

This summary is machine-generated.

Modern machine learning, including deep learning (DL) and XGBoost, effectively handles missing data in high-dimensional datasets. These advanced methods balance bias and variance, outperforming traditional techniques in genetic, financial, and geographical studies.

Keywords:

Deep learning high-dimensional data imputation machine learning missing data

More Related Videos

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts

Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts

Published on: September 20, 2024

Related Experiment Videos

Last Updated: Aug 9, 2025

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts

Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts

Published on: September 20, 2024

Area of Science:

Statistics
Computer Science
Bioinformatics

Background:

High-dimensional data are prevalent in genetics, finance, and geography.
Missing data in these datasets can introduce significant bias.
Proper handling of missing data is crucial for accurate analysis.

Purpose of the Study:

To explore modern machine learning techniques for handling missing data in high-dimensional settings.
To evaluate the performance of penalized regression, tree-based methods, and deep learning (DL).
To compare imputation-based, propensity score, and doubly robust estimators.

Main Methods:

Utilized penalized regression, tree-based approaches (e.g., XGBoost), and deep learning (DL).
Applied imputation-based, propensity score, and doubly robust estimation strategies.
Conducted simulation studies and analyzed a real-world dataset.

Main Results:

Deep learning (DL) and XGBoost demonstrated superior performance in balancing bias and variance.
These advanced methods effectively addressed missing data challenges in high-dimensional analysis.
Simulation studies and real-world application confirmed the benefits of DL and XGBoost.

Conclusions:

Modern machine learning techniques, particularly DL and XGBoost, offer significant advantages for high-dimensional missing data analysis.
These methods provide robust estimation of population means and percentiles.
The findings have implications for various fields utilizing big data, including genetics and finance.