Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Data Collection by Observations

Data Collection by Observations

Data collection refers to a systematic way of obtaining, observing, measuring, and analyzing accurate information. Observational studies are one of the most widely used methods of data collection. It involves collecting data by observing the behavior and physical characteristics of a sample without making any modifications to the sample.
An astronomer viewing the motion and brightness of stars in the sky and recording the data is an example of observational data collection. A botanist recording...

Data Reporting and Recording

Data Reporting and Recording

Reporting and recording are crucial in data documentation. The timely, thorough, and accurate documentation of facts is essential when recording patient data. Failure to record findings during an assessment or interpretation of a problem will result in loss of information and make the patient document unreliable. The reader is left with general impressions if the information is not specific. A recording is documenting data of the individual's health information in a traceable, secure, and...

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...

How Data are Classified: Categorical Data

How Data are Classified: Categorical Data

A variable, usually notated by capital letters such as X and Y, is a characteristic or measurement that can be determined for each member of a population. Data are the actual values of variables. They may be numbers, or they may be words. Datum is a single value.
Data are classified based on whether they are measurable or not. Categorical data cannot be measured; instead, it can be divided into categories. For example, if Y denotes a person's party affiliation, some examples of Y include...

How Data are Classified: Numerical Data

How Data are Classified: Numerical Data

Data that are countable or measurable in specific units are called numerical or quantitative data. Quantitative data are always numbers. Quantitative data are the result of counting or measuring the attributes of a population. Amount of money, pulse rate, weight, number of people living in a town, and number of students who opt for statistics are examples of quantitative data.
Quantitative data may be either discrete or continuous. All quantitative data that take on only specific numerical...

Model Approaches for Pharmacokinetic Data: Compartment Models

Model Approaches for Pharmacokinetic Data: Compartment Models

Compartmental analysis is a widely adopted approach to characterizing drug pharmacokinetics. It uses compartment models that conceptualize the body as a collection of reversibly communicating compartments, each representing a group of tissues exhibiting similar drug distribution characteristics. The movement rate of the drug between these compartments is typically described by first-order kinetics.
Two primary types of compartment models are recognized: mammillary and catenary. The more...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Simulated treatment comparisons with jackknife pseudo values for estimating population-adjusted marginal treatment effects.

Journal of biopharmaceutical statistics·2026

Same author

Acute COPD Exacerbation and long-term cardiopulmonary outcomes: Real-world EXACOS-CP evidence study in Israel.

Respiratory medicine·2026

Same author

Building Robust Foundations for Data Interoperability in Hematological Malignancies.

Studies in health technology and informatics·2026

Same author

Reply to "Stepping up ICS doses in asthma: the crucial roles of relievers, triple therapy, and biomarkers" and "Methodological challenges in evaluating high-dose ICS escalation".

The journal of allergy and clinical immunology. In practice·2026

Same author

Real-World Treatment Patterns and Outcomes for Patients With Metastatic Triple-Negative Breast Cancer in the United States: An Observational Study.

JCO oncology practice·2026

Same author

Dimensionality Reduction Techniques for Improving Propensity Score Specification: An Application to a Cohort Study Using Claims Data.

Pharmacoepidemiology and drug safety·2025

Same journal

French Consumption of Methylphenidate in Primary Care From 2016 to 2023, Impact of Prescribing Policy Changes-A Time-Series Analysis.

Pharmacoepidemiology and drug safety·2026

Same journal

Uptake and Use of Biologic Therapies in Paediatric Immune-Mediated Inflammatory Diseases: An Australian Population-Based Study.

Pharmacoepidemiology and drug safety·2026

Same journal

Comparative Effectiveness of Oral Fluoropyrimidines Versus FOLFOX as Adjuvant Therapy for Stage III Colon Cancer: A Retrospective Cohort Study Using Overlap-Weighted Restricted Mean Survival Time Analysis.

Pharmacoepidemiology and drug safety·2026

Same journal

Association Between EGFR-TKI-Associated Skin Rash and Recorded Mortality in Non-Small Cell Lung Cancer: A Real-World Analysis Accounting for Immortal Time Bias.

Pharmacoepidemiology and drug safety·2026

Same journal

Nationwide Trends in Opioid Consumption in Costa Rica, 2017-2024: Implications for Regulatory Policy and Public Health.

Pharmacoepidemiology and drug safety·2026

Same journal

Mortality in Castration Resistant Prostate Cancer Patients With and Without Pre-Existing Cardiovascular Disease Receiving Oral Androgen Receptor Pathway Inhibitors.

Pharmacoepidemiology and drug safety·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 31, 2026

A Virtual Machine Platform for Non-Computer Professionals for Using Deep Learning to Classify Biological Sequences of Metagenomic Data

A Virtual Machine Platform for Non-Computer Professionals for Using Deep Learning to Classify Biological Sequences of Metagenomic Data

Published on: September 25, 2021

Filling the Gaps in Health Data: Using a Machine Learning Approach to Augment Partially Observed Variables Such as

Stefan Franzen¹, Evangelos Chandakas², Sam Hillman³

¹BPM Evidence Statistics, AstraZeneca, Gothenburg, Sweden.

Pharmacoepidemiology and Drug Safety

|January 30, 2026

Summary

This summary is machine-generated.

Transfer learning effectively imputes missing smoking data in claims, outperforming naive methods when few smokers are recorded. This improves accuracy for behavioral confounder analysis in health studies.

Keywords:

claims data imputation machine learning missing data secondary data transfer learning

More Related Videos

Project-Based Learning Guidelines for Health Sciences Students: An Analysis with Data Mining and Qualitative Techniques

Project-Based Learning Guidelines for Health Sciences Students: An Analysis with Data Mining and Qualitative Techniques

Published on: December 9, 2022

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Related Experiment Videos

Last Updated: Jan 31, 2026

A Virtual Machine Platform for Non-Computer Professionals for Using Deep Learning to Classify Biological Sequences of Metagenomic Data

A Virtual Machine Platform for Non-Computer Professionals for Using Deep Learning to Classify Biological Sequences of Metagenomic Data

Published on: September 25, 2021

Project-Based Learning Guidelines for Health Sciences Students: An Analysis with Data Mining and Qualitative Techniques

Project-Based Learning Guidelines for Health Sciences Students: An Analysis with Data Mining and Qualitative Techniques

Published on: December 9, 2022

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Area of Science:

Health Informatics
Biostatistics
Epidemiology

Background:

Real-world claims data often have missing behavioral confounders like smoking status.
A specific pattern, 'missing with truncation,' occurs when 'yes' is partially observed, but 'no' is entirely missing.
Naively treating missing smoking data as 'no' can cause significant misclassification.

Purpose of the Study:

To evaluate transfer learning for imputing truncated smoking data in health claims.
To compare transfer learning against a naive approach of treating missing data as absence of risk.
To assess imputation accuracy under varying proportions of observed smokers.

Main Methods:

A case study utilized data from the NOVELTY study (NCT02760329) with 9733 patients.
An imputation model was trained on one data subset and evaluated on another.
The model's performance was compared using transfer learning versus a naive 'missing equals no' approach, varying the percentage of smokers retained (q).

Main Results:

Transfer learning achieved higher accuracy (0.89) compared to the naive approach (0.79) for imputing smoking status.
When 90% of smokers were retained (q=90%), transfer learning's accuracy reached 0.94 versus 0.89 for the naive method.
Transfer learning demonstrated superior accuracy when less than 80% of true smokers were recorded.

Conclusions:

Transfer learning offers significant added value for imputing smoking data, especially when few true ever-smokers are recorded.
The benefit of transfer learning depends on the true prevalence of smoking and the predictive model's accuracy.
This method enhances the handling of missing behavioral confounders in large health datasets.