Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

Multiple Regression

Multiple Regression

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

Prediction Intervals

Prediction Intervals

The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y.

Estimating Population Mean with Unknown Standard Deviation

Estimating Population Mean with Unknown Standard Deviation

In practice, we rarely know the population standard deviation. In the past, when the sample size was large, this did not present a problem to statisticians. They used the sample standard deviation s as an estimate for σ and proceeded as before to calculate a confidence interval with close enough results. However, statisticians ran into problems when the sample size was small. A small sample size caused inaccuracies in the confidence interval.
William S. Gosset (1876–1937) of the...

One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for ka Estimation

One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for k_a Estimation

This lesson introduces two critical methods in pharmacokinetics, the Wagner-Nelson and Loo-Riegelman methods, used for estimating the absorption rate constant (ka) for drugs administered via non-intravenous routes. The Wagner-Nelson method relates ka to the plasma concentration derived from the slope of a semilog percent unabsorbed time plot. However, it is limited to drugs with one-compartment kinetics and can be impacted by factors like gastrointestinal motility or enzymatic degradation.
On...

Aggregates Classification

Aggregates Classification

Aggregate classification is generally based on its size, petrographic characteristics, weight, and source. Size classification ranges from coarse to fine aggregates, defined by the size of the particles. Coarse aggregates are particles that do not pass through ASTM sieve No. 4, and aggregates that pass through the sieve are fine aggregates.
Petrographic classification groups aggregates based on common mineralogical characteristics. Some of the common mineral groups found in aggregates are...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Linking chemical data from the Comparative Toxicogenomics Database with adverse outcome pathways from the AOP-Wiki: a mechanistic data-oriented approach to help inform environmental health.

F1000Research·2026

Same author

Comparative effectiveness of cholesteryl ester transfer protein (CETP) inhibitors on cardiovascular outcomes: A comprehensive Bayesian network meta-analysis and network meta-regression.

Medicine·2026

Same author

Predicting early and complete drug release from long-acting injectables using explainable machine learning.

International journal of pharmaceutics·2026

Same author

Unraveling the Complexities of Kartagener's Syndrome: A Case of Bronchiectasis, Isolated Dextrocardia, and Primary Ciliary Dyskinesia in an Adult With Chronic Respiratory Symptoms.

Clinical case reports·2026

Same author

Epidemiologic trajectories and burden of multidrug-resistant tuberculosis (MDR-TB) mortality across South Asia: An analysis of Global Burden of Disease data (1990-2023) with machine learning forecasting to 2050.

Journal of clinical tuberculosis and other mycobacterial diseases·2026

Same author

Predicting Early and Complete Drug Release from Long-Acting Injectables Using Explainable Machine Learning.

ArXiv·2026

Same journal

MVGFormer: Multi-view perspective with graph-guided transformer for cryo-ET segmentation.

Knowledge-based systems·2026

Same journal

Denoising Diffusion Wavelet Models for Zero-shot Medical Image Translation.

Knowledge-based systems·2026

Same journal

Log-based sparse nonnegative matrix factorization for data representation.

Knowledge-based systems·2025

Same journal

Preserving bilateral view structural information for subspace clustering.

Knowledge-based systems·2025

Same journal

Global and Local Similarity Learning in Multi-Kernel Space for Nonnegative Matrix Factorization.

Knowledge-based systems·2025

Same journal

HeteroKGRep: Heterogeneous Knowledge Graph based Drug Repositioning.

Knowledge-based systems·2024

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 27, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

Missing Value Estimation using Clustering and Deep Learning within Multiple Imputation Framework.

Manar D Samad¹, Sakib Abrar¹, Norou Diawara²

¹Department of Computer Science, Tennessee State University, Nashville, TN 37209, United States.

Knowledge-Based Systems

|September 26, 2022

Summary

This summary is machine-generated.

This study enhances missing data imputation using ensemble learning and deep neural networks within Multiple Imputations by Chained Equations (MICE). Cluster labels (CISCL) further improve accuracy, outperforming standard MICE for various missing data types and percentages.

Keywords:

MICE Missing value imputation clustering deep learning ensemble learning multiple imputations

More Related Videos

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Published on: December 15, 2023

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Related Experiment Videos

Last Updated: Aug 27, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Author Spotlight: Advancing Alzheimer's Research – Exploring Early Detection and Multi-Omics Approaches

Published on: December 15, 2023

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Area of Science:

Machine Learning
Data Science
Statistical Modeling

Background:

Missing values in tabular data hinder machine learning model performance.
Multiple Imputations by Chained Equations (MICE) is a popular imputation method using linear conditioning.
Limitations exist in MICE, especially with high missingness percentages and non-random missing data.

Purpose of the Study:

To improve imputation accuracy and classification performance of MICE.
To introduce ensemble learning and deep neural networks (DNN) as replacements for MICE's linear regressors.
To enhance imputation accuracy further using cluster labels (CISCL).

Main Methods:

Replaced MICE's linear regressors with ensemble learning and deep neural networks (DNN).
Incorporated cluster labels (CISCL) derived from training data to characterize samples.
Conducted extensive analyses on six datasets with up to 80% missing values across three missing types.

Main Results:

Ensemble learning or DNN within MICE outperformed baseline MICE (b-MICE).
CISCL significantly improved imputation accuracy, with CISCL + b-MICE outperforming b-MICE universally.
Proposed DNN-based MICE and gradient boosting MICE plus CISCL (GB-MICE-CISCL) surpassed seven state-of-the-art methods.
GB-MICE-CISCL enhanced classification accuracy on imputed data across all missingness percentages.
Identified MICE framework shortcomings at >50% missingness and for non-random missing types.

Conclusions:

Ensemble learning and DNN integration enhance MICE imputation and downstream classification accuracy.
CISCL provides a robust method to improve imputation across diverse missing data scenarios.
GB-MICE-CISCL offers a superior imputation strategy, particularly for complex missing data patterns.
The study provides a framework for selecting optimal imputation models based on data characteristics.