Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data01:16

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

364
Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...
364
Improving Translational Accuracy02:07

Improving Translational Accuracy

3.4K
3.4K
Improving Translational Accuracy02:07

Improving Translational Accuracy

13.9K
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
13.9K
Censoring Survival Data01:09

Censoring Survival Data

439
Survival analysis is a statistical method used to analyze time-to-event data, often employed in fields such as medicine, engineering, and social sciences. One of the key challenges in survival analysis is dealing with incomplete data, a phenomenon known as "censoring." Censoring occurs when the event of interest (such as death, relapse, or system failure) has not occurred for some individuals by the end of the study period or is otherwise unobservable, and it might have many different...
439
Kaplan-Meier Approach01:24

Kaplan-Meier Approach

468
The Kaplan-Meier estimator is a non-parametric method used to estimate the survival function from time-to-event data. In medical research, it is frequently employed to measure the proportion of patients surviving for a certain period after treatment. This estimator is fundamental in analyzing time-to-event data, making it indispensable in clinical trials, epidemiological studies, and reliability engineering. By estimating survival probabilities, researchers can evaluate treatment effectiveness,...
468
Prediction Intervals01:03

Prediction Intervals

3.0K
The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y. 
3.0K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

A novel approach for standardizing clinical laboratory categorical test results using machine learning and string distance similarity.

Heliyon·2023
Same author

Virtual diabetic patient with physical activity dynamics.

Computer methods and programs in biomedicine·2021
Same journal

CardiaTics: An explainable AI integrated heart disease diagnosis model with feature engineering and stacked ensemble approach.

Journal of big data·2026
Same journal

Comprehensive representation of health-related phenotypes in one million dogs using topic modelling of electronic health records.

Journal of big data·2026
Same journal

UniqueNOSD: a novel framework for NoSQL over SQL databases.

Journal of big data·2025
Same journal

<i>F</i>u<i>n</i>Da: scalable serverless data analytics and in situ query processing.

Journal of big data·2025
Same journal

Integrating Big Data, Artificial Intelligence, and motion analysis for emerging precision medicine applications in Parkinson's Disease.

Journal of big data·2024
Same journal

Interpolation-split: a data-centric deep learning approach with big interpolated data to boost airway segmentation performance.

Journal of big data·2024
See all related articles

Related Experiment Video

Updated: Dec 18, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.8K

SICE: an improved missing data imputation technique.

Shahidul Islam Khan1,2, Abu Sayed Md Latiful Hoque1

  • 1Department of CSE, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh.

Journal of Big Data
|June 18, 2020
PubMed
Summary
This summary is machine-generated.

This study introduces a novel hybrid imputation technique for missing data, improving binary data imputation by 20% and reducing numeric data errors by 11%. This method enhances data analytics performance in big data environments.

Keywords:
Data AnalyticsMICEMissing Data ImputationMultiple ImputationSingle Imputation

More Related Videos

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

15.0K
Untargeted Liquid Chromatography-Mass Spectrometry-Based Metabolomics Analysis of Wheat Grain
07:10

Untargeted Liquid Chromatography-Mass Spectrometry-Based Metabolomics Analysis of Wheat Grain

Published on: March 13, 2020

10.4K

Related Experiment Videos

Last Updated: Dec 18, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.8K
Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

15.0K
Untargeted Liquid Chromatography-Mass Spectrometry-Based Metabolomics Analysis of Wheat Grain
07:10

Untargeted Liquid Chromatography-Mass Spectrometry-Based Metabolomics Analysis of Wheat Grain

Published on: March 13, 2020

10.4K

Area of Science:

  • Data Science
  • Machine Learning
  • Statistics

Background:

  • Missing data significantly degrades performance in data analytics and can lead to incorrect predictions.
  • Efficiently handling missing values is crucial in the big data era due to the massive volume of data generated.
  • Existing imputation methods may not be optimal for all data types, necessitating improved techniques.

Purpose of the Study:

  • To propose a novel hybrid technique for missing data imputation, combining single and multiple imputation approaches.
  • To extend the Multivariate Imputation by Chained Equation (MICE) algorithm for both categorical and numeric data.
  • To evaluate the proposed technique against twelve existing algorithms using diverse real-world and public datasets.

Main Methods:

  • Developed a hybrid imputation technique extending the Multivariate Imputation by Chained Equation (MICE) algorithm.
  • Implemented two variations of the extended MICE algorithm for categorical and numeric data imputation.
  • Compared the proposed method with twelve existing algorithms on sixty-five thousand real health records and three public datasets.

Main Results:

  • The proposed algorithm achieved a 20% higher F-measure for binary data imputation compared to existing methods.
  • The new technique demonstrated an 11% reduction in error for numeric data imputation.
  • The performance improvements were achieved with comparable execution times to existing algorithms.

Conclusions:

  • The proposed hybrid imputation technique offers superior performance for both binary and numeric data.
  • The extended MICE algorithm effectively handles missing values in diverse datasets, including sensitive health records.
  • This advancement contributes to more accurate predictions and better utilization of big data in analytics.