Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...

Improving Translational Accuracy

Improving Translational Accuracy

Improving Translational Accuracy

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...

Censoring Survival Data

Censoring Survival Data

Survival analysis is a statistical method used to analyze time-to-event data, often employed in fields such as medicine, engineering, and social sciences. One of the key challenges in survival analysis is dealing with incomplete data, a phenomenon known as "censoring." Censoring occurs when the event of interest (such as death, relapse, or system failure) has not occurred for some individuals by the end of the study period or is otherwise unobservable, and it might have many different...

Kaplan-Meier Approach

Kaplan-Meier Approach

The Kaplan-Meier estimator is a non-parametric method used to estimate the survival function from time-to-event data. In medical research, it is frequently employed to measure the proportion of patients surviving for a certain period after treatment. This estimator is fundamental in analyzing time-to-event data, making it indispensable in clinical trials, epidemiological studies, and reliability engineering. By estimating survival probabilities, researchers can evaluate treatment effectiveness,...

Prediction Intervals

Prediction Intervals

The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

A novel approach for standardizing clinical laboratory categorical test results using machine learning and string distance similarity.

Heliyon·2023

Same author

Virtual diabetic patient with physical activity dynamics.

Computer methods and programs in biomedicine·2021

Same journal

CardiaTics: An explainable AI integrated heart disease diagnosis model with feature engineering and stacked ensemble approach.

Journal of big data·2026

Same journal

Comprehensive representation of health-related phenotypes in one million dogs using topic modelling of electronic health records.

Journal of big data·2026

Same journal

UniqueNOSD: a novel framework for NoSQL over SQL databases.

Journal of big data·2025

Same journal

<i>F</i>u<i>n</i>Da: scalable serverless data analytics and in situ query processing.

Journal of big data·2025

Same journal

Integrating Big Data, Artificial Intelligence, and motion analysis for emerging precision medicine applications in Parkinson's Disease.

Journal of big data·2024

Same journal

Interpolation-split: a data-centric deep learning approach with big interpolated data to boost airway segmentation performance.

Journal of big data·2024

See all related articles

Search research articles

Related Experiment Video

Updated: Dec 18, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

SICE: an improved missing data imputation technique.

Shahidul Islam Khan^1,2, Abu Sayed Md Latiful Hoque¹

¹Department of CSE, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh.

Journal of Big Data

|June 18, 2020

Summary

This summary is machine-generated.

This study introduces a novel hybrid imputation technique for missing data, improving binary data imputation by 20% and reducing numeric data errors by 11%. This method enhances data analytics performance in big data environments.

Keywords:

Data Analytics MICE Missing Data Imputation Multiple Imputation Single Imputation

More Related Videos

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

Untargeted Liquid Chromatography-Mass Spectrometry-Based Metabolomics Analysis of Wheat Grain

Untargeted Liquid Chromatography-Mass Spectrometry-Based Metabolomics Analysis of Wheat Grain

Published on: March 13, 2020

Related Experiment Videos

Last Updated: Dec 18, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

Untargeted Liquid Chromatography-Mass Spectrometry-Based Metabolomics Analysis of Wheat Grain

Untargeted Liquid Chromatography-Mass Spectrometry-Based Metabolomics Analysis of Wheat Grain

Published on: March 13, 2020

Area of Science:

Data Science
Machine Learning
Statistics

Background:

Missing data significantly degrades performance in data analytics and can lead to incorrect predictions.
Efficiently handling missing values is crucial in the big data era due to the massive volume of data generated.
Existing imputation methods may not be optimal for all data types, necessitating improved techniques.

Purpose of the Study:

To propose a novel hybrid technique for missing data imputation, combining single and multiple imputation approaches.
To extend the Multivariate Imputation by Chained Equation (MICE) algorithm for both categorical and numeric data.
To evaluate the proposed technique against twelve existing algorithms using diverse real-world and public datasets.

Main Methods:

Developed a hybrid imputation technique extending the Multivariate Imputation by Chained Equation (MICE) algorithm.
Implemented two variations of the extended MICE algorithm for categorical and numeric data imputation.
Compared the proposed method with twelve existing algorithms on sixty-five thousand real health records and three public datasets.

Main Results:

The proposed algorithm achieved a 20% higher F-measure for binary data imputation compared to existing methods.
The new technique demonstrated an 11% reduction in error for numeric data imputation.
The performance improvements were achieved with comparable execution times to existing algorithms.

Conclusions:

The proposed hybrid imputation technique offers superior performance for both binary and numeric data.
The extended MICE algorithm effectively handles missing values in diverse datasets, including sensitive health records.
This advancement contributes to more accurate predictions and better utilization of big data in analytics.