Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Truncation in Survival Analysis01:09

Truncation in Survival Analysis

343
Truncation in survival analysis refers to the exclusion of individuals or events from the dataset based on specific criteria related to the time of the event. This exclusion can happen in two primary forms: left truncation and right truncation.
Left truncation occurs when individuals who experienced the event of interest before a certain time are not included in the study. This is often due to a "delayed entry" into the study where only those who survive until a certain entry point are...
343
Estimating Population Mean with Unknown Standard Deviation01:22

Estimating Population Mean with Unknown Standard Deviation

8.3K
In practice, we rarely know the population standard deviation. In the past, when the sample size was large, this did not present a problem to statisticians. They used the sample standard deviation s as an estimate for σ and proceeded as before to calculate a confidence interval with close enough results. However, statisticians ran into problems when the sample size was small. A small sample size caused inaccuracies in the confidence interval.
William S. Gosset (1876–1937) of the...
8.3K
Detection of Gross Error: The Q Test01:00

Detection of Gross Error: The Q Test

6.5K
When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...
6.5K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

2.7K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
2.7K
Mechanistic Models: Compartment Models in Individual and Population Analysis01:23

Mechanistic Models: Compartment Models in Individual and Population Analysis

110
Mechanistic models are utilized in individual analysis using single-source data, but imperfections arise due to data collection errors, preventing perfect prediction of observed data. The mathematical equation involves known values (Xi), observed concentrations (Ci), measurement errors (εi), model parameters (ϕj), and the related function (ƒi) for i number of values. Different least-squares metrics quantify differences between predicted and observed values. The ordinary least...
110
Quantitative Analysis01:12

Quantitative Analysis

754
Quantitative analysis is a technique for measuring the amount of specific constituents in a sample. When the sample's composition is unknown, qualitative analysis is performed first to identify its components, which ensures that the correct substances are measured during the quantitative phase.
In quantitative analysis, two key measurements are made: the sample quantity and a property proportional to the amount of the analyte (the substance being analyzed). This forms the basis of the...
754

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Large Language Models for Automating Clinical Trial Criteria Conversion to Observational Medical Outcomes Partnership Common Data Model Queries: Validation and Evaluation Study.

JMIR medical informatics·2025
Same author

Cardiovascular Outcomes of Early LDL-C Goal Achievement in Patients with Very-High-Risk ASCVD.

Cardiology and therapy·2025
Same author

Safety and Feasibility of Robot-Assisted Percutaneous Coronary Intervention Using the AVIAR 2.0 System: A Prospective, Multi-Center, Single-Arm, Open, Investigator-Initiated, Post-Approval Clinical Trial.

Korean circulation journal·2024
Same author

Task-Specific Transformer-Based Language Models in Health Care: Scoping Review.

JMIR medical informatics·2024
Same author

Cardiovascular Outcomes Associated With Isolated Systolic or Diastolic Hypertension According to the 2017 AHA/ACC Guideline in Adult Cancer Survivors.

Journal of the American Heart Association·2024
Same author

Forecasting Hospital Room and Ward Occupancy Using Static and Dynamic Information Concurrently: Retrospective Single-Center Cohort Study.

JMIR medical informatics·2024
Same journal

Predicting Tuberculosis Outcomes Using Routine Surveillance Data in Chiang Mai, Thailand: Retrospective Cohort Study.

JMIR public health and surveillance·2026
Same journal

Multimodal Data Approaches for Examining the 2024-2025 Highly Pathogenic Avian Influenza Outbreak in the United States: Descriptive Study.

JMIR public health and surveillance·2026
Same journal

Encouraging Adults at Risk for Type 2 Diabetes to Enroll in Diabetes Prevention Programs Through a Media Campaign in Hawai'i: Cross-Sectional Study.

JMIR public health and surveillance·2026
Same journal

Experts' Opinions on the Sustainable Use of Digital Health Tools for Effective Future Pandemic Preparedness and Response: Questionnaire Study.

JMIR public health and surveillance·2026
Same journal

Retraction: Secular Trends in Gastric and Esophageal Cancer Attributable to Dietary Carcinogens From 1990 to 2019 and Projections Until 2044 in China: Population-Based Study.

JMIR public health and surveillance·2026
Same journal

Legal Infoveillance of Unlicensed Medical Practices in South Korea Through Criminal Court Decisions Using Machine Learning: Retrospective Observational Study.

JMIR public health and surveillance·2026
See all related articles

Related Experiment Video

Updated: Oct 17, 2025

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

14.7K

Self-Training With Quantile Errors for Multivariate Missing Data Imputation for Regression Problems in Electronic

Hansle Gwon1,2, Imjin Ahn1,2, Yunha Kim1,2

  • 1Department of Medical Science, Asan Medical Institute of Convergence Science and Technology, Seoul, Republic of Korea.

JMIR Public Health and Surveillance
|October 13, 2021
PubMed
Summary
This summary is machine-generated.

This study introduces a self-training method to address missing data in machine learning, particularly for scarce medical datasets. The novel approach significantly improved imputation accuracy compared to traditional methods.

Keywords:
artificial intelligenceelectronic medical recordsimputationself-training

More Related Videos

Establishing a Competing Risk Regression Nomogram Model for Survival Data
04:57

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Published on: October 23, 2020

10.4K
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.7K

Related Experiment Videos

Last Updated: Oct 17, 2025

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

14.7K
Establishing a Competing Risk Regression Nomogram Model for Survival Data
04:57

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Published on: October 23, 2020

10.4K
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.7K

Area of Science:

  • Machine Learning
  • Data Science
  • Medical Informatics

Background:

  • Missing data is a prevalent challenge in real-world machine learning applications.
  • Existing imputation methods include statistical approaches (mean, expectation-maximization, MICE) and machine learning techniques (MLP, k-NN, decision trees).

Purpose of the Study:

  • To impute numeric medical data, including physical and laboratory values.
  • To develop an effective data imputation strategy using self-training for scarce medical data environments.

Main Methods:

  • Proposed a progressive self-training method to gradually increase available data for model training.
  • Employed pseudolabeling: models trained on complete data predict missing values, and valid predictions are incorporated back into the complete dataset.
  • Iteratively repeated the prediction and incorporation process until a stopping condition was met, evaluating pseudolabel accuracy by its impact on model performance.

Main Results:

  • Self-training with Random Forest (RF) demonstrated up to 12% lower mean squared error and 0.1% higher Pearson correlation coefficient compared to pure RF.
  • Statistical tests (Friedman, Wilcoxon signed-rank) confirmed the significant improvement of self-training over Multiple Imputations by Chained Equations (MICE) and mean imputation (p < .05 and p = 3.05e-5, respectively).

Conclusions:

  • Self-training shows statistically significant improvements in imputing missing values, particularly for medical datasets.
  • Further validation in real-world machine learning systems and refinement of pseudolabel evaluation methods are warranted for future research.