Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Data Reporting and Recording01:24

Data Reporting and Recording

5.4K
Reporting and recording are crucial in data documentation. The timely, thorough, and accurate documentation of facts is essential when recording patient data. Failure to record findings during an assessment or interpretation of a problem will result in loss of information and make the patient document unreliable. The reader is left with general impressions if the information is not specific. A recording is documenting data of the individual's health information in a traceable, secure, and...
5.4K
Analysis of Population Pharmacokinetic Data01:12

Analysis of Population Pharmacokinetic Data

690
Analysis of population pharmacokinetic data involves studying the behavior of drugs within diverse populations to understand their pharmacokinetic parameters. Traditional pharmacokinetic methods typically involve collecting samples from a few individuals and estimating these parameters. While these methods are commonly used, they have limitations in capturing the variability in drug response among individuals or heterogeneous populations. Population pharmacokinetics is employed to address these...
690
Analysis Methods of Pharmacokinetic Data: Model and Model-Independent Approaches01:14

Analysis Methods of Pharmacokinetic Data: Model and Model-Independent Approaches

498
Drug disposition in the body is a complex process and can be studied using two major approaches: the model and the model-independent approaches.
The model approach uses mathematical models to describe changes in drug concentration over time. Pharmacokinetic models help characterize drug behavior in patients, predict drug concentration in the body fluids, calculate optimum dosage regimens, and evaluate the risk of toxicity. However, ensuring that the model fits the experimental data accurately...
498
How Data are Classified: Numerical Data00:59

How Data are Classified: Numerical Data

36.9K
Data that are countable or measurable in specific units are called numerical or quantitative data. Quantitative data are always numbers. Quantitative data are the result of counting or measuring the attributes of a population. Amount of money, pulse rate, weight, number of people living in a town, and number of students who opt for statistics are examples of quantitative data.
Quantitative data may be either discrete or continuous. All quantitative data that take on only specific numerical...
36.9K
Model-Independent Approaches for Pharmacokinetic Data: Noncompartmental Analysis00:59

Model-Independent Approaches for Pharmacokinetic Data: Noncompartmental Analysis

325
Noncompartmental analyses offer an alternative method for describing drug pharmacokinetics without relying on a specific compartmental model. In this approach, the drug's pharmacokinetics are assumed to be linear, with the terminal phase log-linear. This assumption allows for simplified analysis and interpretation of the drug's behavior in the body.
One important characteristic of noncompartmental analyses is that drug exposure increases proportionally with increasing doses. This...
325
How Data are Classified: Categorical Data01:11

How Data are Classified: Categorical Data

42.9K
A variable, usually notated by capital letters such as X and Y, is a characteristic or measurement that can be determined for each member of a population. Data are the actual values of variables. They may be numbers, or they may be words. Datum is a single value.
Data are classified based on whether they are measurable or not. Categorical data cannot be measured; instead, it can be divided into categories. For example, if Y denotes a person's party affiliation, some examples of Y include...
42.9K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Loss function influence on hyperparameter optimization for observational healthcare prediction models.

Journal of the American Medical Informatics Association : JAMIA·2026
Same author

Real-world evidence for comparative safety of second-line antihyperglycemic agents in older adults with type 2 diabetes.

Nature communications·2026
Same author

Risk of prostatitis in patients with type 2 diabetes mellitus: An observational retrospective cohort study of canagliflozin versus other antihyperglycemic agents using propensity score matching.

PloS one·2026
Same author

Trust in Observational Research.

Journal of the American College of Cardiology·2026
Same author

A lossless one-shot distributed algorithm for addressing heterogeneity in multi-site generalized linear models.

Journal of the American Medical Informatics Association : JAMIA·2025
Same author

Macrolide prescribing and preemptive electrocardiograms in asthma, COPD, ACO, and general population: a drug-utilization study.

The Journal of asthma : official journal of the Association for the Care of Asthma·2025
Same journal

Evaluation of temporal preservation in synthetic longitudinal patient data.

Journal of biomedical informatics·2026
Same journal

ARKE: An ontology-driven framework for automated mapping of local radiology procedure terms to the LOINC-RadLex playbook using large language model.

Journal of biomedical informatics·2026
Same journal

A validation-driven training controller for cross-lingual biomedical NER via reinforcement learning-based adaptive loss weighting.

Journal of biomedical informatics·2026
Same journal

ASP-HR: An Adaptive Spatial Perception and Hierarchical Reasoning mechanism for document-level biomedical relation extraction.

Journal of biomedical informatics·2026
Same journal

Beyond Accuracy: Safety-Centered guidelines for the evaluation of LLM-based therapy recommendation systems for chronic multimorbidity patients.

Journal of biomedical informatics·2026
Same journal

DeepEN: A deep reinforcement learning framework for personalized enteral nutrition in critical care.

Journal of biomedical informatics·2026
See all related articles

Related Experiment Video

Updated: Jan 21, 2026

The Participant-Reported Implementation Update and Score PRIUS: A Novel Method for Capturing Implementation-Related Data Over Time
06:05

The Participant-Reported Implementation Update and Score PRIUS: A Novel Method for Capturing Implementation-Related Data Over Time

Published on: February 19, 2021

1.6K

Supplementing claims data analysis using self-reported data to develop a probabilistic phenotype model for current

Jenna M Reps1, Peter R Rijnbeek2, Patrick B Ryan1

  • 1Janssen Research and Development, Titusville, NJ, USA.

Journal of Biomedical Informatics
|August 7, 2019
PubMed
Summary
This summary is machine-generated.

A new model, CROSS, accurately predicts current smoking status using US claims data. This tool helps impute missing smoking information in epidemiological studies, improving research accuracy.

Keywords:
Claims dataImputationPatient-level predictionProbabilistic phenotypeRiskSmoking

More Related Videos

Data Collection on Marine Litter Ingestion in Sea Turtles and Thresholds for Good Environmental Status
13:18

Data Collection on Marine Litter Ingestion in Sea Turtles and Thresholds for Good Environmental Status

Published on: May 18, 2019

12.6K
Basics of Multivariate Analysis in Neuroimaging Data
06:35

Basics of Multivariate Analysis in Neuroimaging Data

Published on: July 24, 2010

17.3K

Related Experiment Videos

Last Updated: Jan 21, 2026

The Participant-Reported Implementation Update and Score PRIUS: A Novel Method for Capturing Implementation-Related Data Over Time
06:05

The Participant-Reported Implementation Update and Score PRIUS: A Novel Method for Capturing Implementation-Related Data Over Time

Published on: February 19, 2021

1.6K
Data Collection on Marine Litter Ingestion in Sea Turtles and Thresholds for Good Environmental Status
13:18

Data Collection on Marine Litter Ingestion in Sea Turtles and Thresholds for Good Environmental Status

Published on: May 18, 2019

12.6K
Basics of Multivariate Analysis in Neuroimaging Data
06:35

Basics of Multivariate Analysis in Neuroimaging Data

Published on: July 24, 2010

17.3K

Area of Science:

  • Health Informatics
  • Epidemiology
  • Data Science

Background:

  • Smoking status is often missing in US health insurance claims data.
  • Accurate smoking data is crucial for epidemiological studies and confounder adjustment.
  • The IBM MarketScan Commercial database offers a potential source for smoking status imputation.

Purpose of the Study:

  • To develop a generalizable smoking status phenotype model using US claims data.
  • To investigate the utility of a subset of patients with self-reported smoking status for model training.
  • To create a model that calculates the probability of being a current smoker.

Main Methods:

  • A subset of 1,966,174 patients with linked health risk assessments was used.
  • A regularized logistic regression model, Current Risk of Smoking Status (CROSS), was trained.
  • CROSS utilized 53,027 covariates from the prior 365 days, including demographics, conditions, drugs, and procedures.

Main Results:

  • The CROSS model achieved an internal AUC of 0.76 and was well-calibrated.
  • External validation across three US claims databases yielded AUCs between 0.82 and 0.87.
  • The model demonstrated transportability across different claims data sources.

Conclusions:

  • The CROSS model effectively predicts current smoking status from prior year claims data.
  • CROSS can be implemented with OMOP common data model-mapped US insurance claims.
  • This model is valuable for imputing smoking status in epidemiological research where it's a known confounder.