Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Data Collection by Observations01:08

Data Collection by Observations

15.0K
Data collection refers to a systematic way of obtaining, observing, measuring, and analyzing accurate information. Observational studies are one of the most widely used methods of data collection. It involves collecting data by observing the behavior and physical characteristics of a sample without making any modifications to the sample.
An astronomer viewing the motion and brightness of stars in the sky and recording the data is an example of observational data collection. A botanist recording...
15.0K
Data Reporting and Recording01:24

Data Reporting and Recording

5.4K
Reporting and recording are crucial in data documentation. The timely, thorough, and accurate documentation of facts is essential when recording patient data. Failure to record findings during an assessment or interpretation of a problem will result in loss of information and make the patient document unreliable. The reader is left with general impressions if the information is not specific. A recording is documenting data of the individual's health information in a traceable, secure, and...
5.4K
Observational Learning01:12

Observational Learning

969
Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...
969
How Data are Classified: Categorical Data01:11

How Data are Classified: Categorical Data

44.5K
A variable, usually notated by capital letters such as X and Y, is a characteristic or measurement that can be determined for each member of a population. Data are the actual values of variables. They may be numbers, or they may be words. Datum is a single value.
Data are classified based on whether they are measurable or not. Categorical data cannot be measured; instead, it can be divided into categories. For example, if Y denotes a person's party affiliation, some examples of Y include...
44.5K
How Data are Classified: Numerical Data00:59

How Data are Classified: Numerical Data

38.0K
Data that are countable or measurable in specific units are called numerical or quantitative data. Quantitative data are always numbers. Quantitative data are the result of counting or measuring the attributes of a population. Amount of money, pulse rate, weight, number of people living in a town, and number of students who opt for statistics are examples of quantitative data.
Quantitative data may be either discrete or continuous. All quantitative data that take on only specific numerical...
38.0K
Model Approaches for Pharmacokinetic Data: Compartment Models01:14

Model Approaches for Pharmacokinetic Data: Compartment Models

554
Compartmental analysis is a widely adopted approach to characterizing drug pharmacokinetics. It uses compartment models that conceptualize the body as a collection of reversibly communicating compartments, each representing a group of tissues exhibiting similar drug distribution characteristics. The movement rate of the drug between these compartments is typically described by first-order kinetics.
Two primary types of compartment models are recognized: mammillary and catenary. The more...
554

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Simulated treatment comparisons with jackknife pseudo values for estimating population-adjusted marginal treatment effects.

Journal of biopharmaceutical statistics·2026
Same author

Acute COPD Exacerbation and long-term cardiopulmonary outcomes: Real-world EXACOS-CP evidence study in Israel.

Respiratory medicine·2026
Same author

Building Robust Foundations for Data Interoperability in Hematological Malignancies.

Studies in health technology and informatics·2026
Same author

Reply to "Stepping up ICS doses in asthma: the crucial roles of relievers, triple therapy, and biomarkers" and "Methodological challenges in evaluating high-dose ICS escalation".

The journal of allergy and clinical immunology. In practice·2026
Same author

Real-World Treatment Patterns and Outcomes for Patients With Metastatic Triple-Negative Breast Cancer in the United States: An Observational Study.

JCO oncology practice·2026
Same author

Dimensionality Reduction Techniques for Improving Propensity Score Specification: An Application to a Cohort Study Using Claims Data.

Pharmacoepidemiology and drug safety·2025
Same journal

French Consumption of Methylphenidate in Primary Care From 2016 to 2023, Impact of Prescribing Policy Changes-A Time-Series Analysis.

Pharmacoepidemiology and drug safety·2026
Same journal

Uptake and Use of Biologic Therapies in Paediatric Immune-Mediated Inflammatory Diseases: An Australian Population-Based Study.

Pharmacoepidemiology and drug safety·2026
Same journal

Comparative Effectiveness of Oral Fluoropyrimidines Versus FOLFOX as Adjuvant Therapy for Stage III Colon Cancer: A Retrospective Cohort Study Using Overlap-Weighted Restricted Mean Survival Time Analysis.

Pharmacoepidemiology and drug safety·2026
Same journal

Association Between EGFR-TKI-Associated Skin Rash and Recorded Mortality in Non-Small Cell Lung Cancer: A Real-World Analysis Accounting for Immortal Time Bias.

Pharmacoepidemiology and drug safety·2026
Same journal

Nationwide Trends in Opioid Consumption in Costa Rica, 2017-2024: Implications for Regulatory Policy and Public Health.

Pharmacoepidemiology and drug safety·2026
Same journal

Mortality in Castration Resistant Prostate Cancer Patients With and Without Pre-Existing Cardiovascular Disease Receiving Oral Androgen Receptor Pathway Inhibitors.

Pharmacoepidemiology and drug safety·2026
See all related articles

Related Experiment Video

Updated: Jan 31, 2026

A Virtual Machine Platform for Non-Computer Professionals for Using Deep Learning to Classify Biological Sequences of Metagenomic Data
09:34

A Virtual Machine Platform for Non-Computer Professionals for Using Deep Learning to Classify Biological Sequences of Metagenomic Data

Published on: September 25, 2021

4.5K

Filling the Gaps in Health Data: Using a Machine Learning Approach to Augment Partially Observed Variables Such as

Stefan Franzen1, Evangelos Chandakas2, Sam Hillman3

  • 1BPM Evidence Statistics, AstraZeneca, Gothenburg, Sweden.

Pharmacoepidemiology and Drug Safety
|January 30, 2026
PubMed
Summary
This summary is machine-generated.

Transfer learning effectively imputes missing smoking data in claims, outperforming naive methods when few smokers are recorded. This improves accuracy for behavioral confounder analysis in health studies.

Keywords:
claims dataimputationmachine learningmissing datasecondary datatransfer learning

More Related Videos

Project-Based Learning Guidelines for Health Sciences Students: An Analysis with Data Mining and Qualitative Techniques
13:44

Project-Based Learning Guidelines for Health Sciences Students: An Analysis with Data Mining and Qualitative Techniques

Published on: December 9, 2022

4.5K
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

8.1K

Related Experiment Videos

Last Updated: Jan 31, 2026

A Virtual Machine Platform for Non-Computer Professionals for Using Deep Learning to Classify Biological Sequences of Metagenomic Data
09:34

A Virtual Machine Platform for Non-Computer Professionals for Using Deep Learning to Classify Biological Sequences of Metagenomic Data

Published on: September 25, 2021

4.5K
Project-Based Learning Guidelines for Health Sciences Students: An Analysis with Data Mining and Qualitative Techniques
13:44

Project-Based Learning Guidelines for Health Sciences Students: An Analysis with Data Mining and Qualitative Techniques

Published on: December 9, 2022

4.5K
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

8.1K

Area of Science:

  • Health Informatics
  • Biostatistics
  • Epidemiology

Background:

  • Real-world claims data often have missing behavioral confounders like smoking status.
  • A specific pattern, 'missing with truncation,' occurs when 'yes' is partially observed, but 'no' is entirely missing.
  • Naively treating missing smoking data as 'no' can cause significant misclassification.

Purpose of the Study:

  • To evaluate transfer learning for imputing truncated smoking data in health claims.
  • To compare transfer learning against a naive approach of treating missing data as absence of risk.
  • To assess imputation accuracy under varying proportions of observed smokers.

Main Methods:

  • A case study utilized data from the NOVELTY study (NCT02760329) with 9733 patients.
  • An imputation model was trained on one data subset and evaluated on another.
  • The model's performance was compared using transfer learning versus a naive 'missing equals no' approach, varying the percentage of smokers retained (q).

Main Results:

  • Transfer learning achieved higher accuracy (0.89) compared to the naive approach (0.79) for imputing smoking status.
  • When 90% of smokers were retained (q=90%), transfer learning's accuracy reached 0.94 versus 0.89 for the naive method.
  • Transfer learning demonstrated superior accuracy when less than 80% of true smokers were recorded.

Conclusions:

  • Transfer learning offers significant added value for imputing smoking data, especially when few true ever-smokers are recorded.
  • The benefit of transfer learning depends on the true prevalence of smoking and the predictive model's accuracy.
  • This method enhances the handling of missing behavioral confounders in large health datasets.