Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Censoring Survival Data01:09

Censoring Survival Data

523
Survival analysis is a statistical method used to analyze time-to-event data, often employed in fields such as medicine, engineering, and social sciences. One of the key challenges in survival analysis is dealing with incomplete data, a phenomenon known as "censoring." Censoring occurs when the event of interest (such as death, relapse, or system failure) has not occurred for some individuals by the end of the study period or is otherwise unobservable, and it might have many different...
523
Bias in Epidemiological Studies01:29

Bias in Epidemiological Studies

1.3K
Biases can arise at various stages of research, from study design and data collection to analysis and interpretation. Recognizing and addressing these biases is essential to ensure the validity and reliability of epidemiological findings.Broadly speaking, biases in epidemiology fall into three main categories: selection bias, information bias, and confounding. A more detailed description of possible biases is:  
1.3K
Methods of Documentation VII: EMR01:30

Methods of Documentation VII: EMR

1.4K
Electronic Medical Records (EMRs) primarily center around electronically documenting patients' health information within a single healthcare organization or practice. They contain essential clinical data related to a patient's medical history, diagnoses, medications, treatment plans, lab results, and other pertinent information relevant to the specific encounter or episode of care. EMRs are designed to streamline documentation and workflow processes within individual healthcare...
1.4K
Ethical Standards II01:23

Ethical Standards II

1.2K
Ethical standards are the backbone of nursing practice, guiding nurses as they interact with patients, families, and colleagues. These standards are crucial for providing safe, empathetic care centered on the patient's needs.
Nurses are entrusted with upholding various ethical principles and standards. Nurses forge solid therapeutic relationships using trust, empathy, autonomy, confidentiality, and professional competence.
Confidentiality is crucial, embodying respect for individual privacy...
1.2K
Blind Procedures02:07

Blind Procedures

12.9K
Ideally, the people who observe and record the children’s behavior are unaware of who was assigned to the experimental or control group, in order to control for experimenter bias. Experimenter bias refers to the possibility that a researcher’s expectations might skew the results of the study. Remember, conducting an experiment requires a lot of planning, and the people involved in the research project have a vested interest in supporting their hypotheses. If the observers knew which...
12.9K
Bias01:22

Bias

7.2K
Bias refers to any tendency that prevents a question from being considered unprejudiced. In research, bias occurs when one outcome or answer is selected or encouraged over others in sampling or testing. Bias can occur during any research phase, including study design, data collection, analysis, and publication.
In statistics, a sampling bias is created when a sample is collected from a population, and some members of the population are not as likely to be chosen as others (remember, each member...
7.2K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Development of the authentication and authorization processes for the iAgree portal, a platform for patient-controlled data sharing across health systems.

JAMIA open·2026
Same author

Outcome and Exposure Polygenic Risk Scores Can Help Reduce Information Bias and Selection Bias in Regression Estimates From Biobank Data.

Genetic epidemiology·2026
Same author

Maternal inflammation and oxidative stress during pregnancy and emotional-behavioral problems in children aged 1.5-3 years: A longitudinal repeated-measures study.

Journal of affective disorders·2026
Same author

Privacy-enhancing sequential learning under heterogeneous selection bias in multi-site electronic health records data.

Journal of the American Medical Informatics Association : JAMIA·2026
Same author

Evaluation of integrated, multimedia biomarkers of prenatal metals exposure in association with child neurodevelopment in Puerto Rico.

Journal of exposure science & environmental epidemiology·2026
Same author

Prenatal phthalate exposure and emotional-behavioral problems in children aged 1.5 to 3 years from the PROTECT birth cohort.

Journal of exposure science & environmental epidemiology·2026
Same journal

Comparative Evaluation of Pretrained Large Language Models for Suicide Risk Prediction from Clinical Notes in U.S. Veterans.

medRxiv : the preprint server for health sciences·2026
Same journal

Nocturnal Respiratory Rate and Variability Predict Long-term Mortality in Stable Outpatients with Cardiovascular Disease.

medRxiv : the preprint server for health sciences·2026
Same journal

MOSAIC: Methylation-Oriented Site Analysis and Information Classifier for Robust Epigenomic Classification of Acute Leukemia in Clinical Cohorts with Variable Tumor Purity.

medRxiv : the preprint server for health sciences·2026
Same journal

Risk beliefs, intensive digital information and demand for a new preventative health product in public clinics: Evidence from an experiment in Zimbabwe.

medRxiv : the preprint server for health sciences·2026
Same journal

Development of an automated, imaging-based preoperative screening model for early identification of malnutrition in an abdominal surgery cohort.

medRxiv : the preprint server for health sciences·2026
Same journal

A Pilot Project Leveraging Large Language Models for Automated Screening and Variable Extraction in Observational Studies.

medRxiv : the preprint server for health sciences·2026
See all related articles

Related Experiment Video

Updated: Jan 16, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.9K

Privacy-Enhancing Sequential Learning under Heterogeneous Selection Bias in Multi-Site EHR Data.

Ritoban Kundu, Xu Shi, Kumar Kshitij Patel

    Medrxiv : the Preprint Server for Health Sciences
    |October 3, 2025
    PubMed
    Summary
    This summary is machine-generated.

    New privacy-preserving statistical methods enable disease risk modeling across electronic health record (EHR) sites. These methods accurately estimate smoking-cancer associations without sharing sensitive patient data.

    More Related Videos

    A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
    12:18

    A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

    Published on: January 11, 2020

    7.9K

    Related Experiment Videos

    Last Updated: Jan 16, 2026

    Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
    07:35

    Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

    Published on: October 11, 2018

    7.9K
    A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
    12:18

    A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

    Published on: January 11, 2020

    7.9K

    Area of Science:

    • Biostatistics
    • Epidemiology
    • Health Informatics

    Background:

    • Electronic health record (EHR) data is crucial for disease research but often siloed across institutions with varying recruitment strategies.
    • Centralized analysis is infeasible due to data heterogeneity and privacy concerns, hindering large-scale collaborative research.
    • Developing privacy-enhancing methods is essential for leveraging distributed EHR data.

    Purpose of the Study:

    • To develop and validate privacy-enhancing statistical methods for estimating disease risk model parameters across multiple EHR sites with heterogeneous selection.
    • To enable collaborative research without sharing raw individual-level data, addressing privacy and data access challenges.
    • To apply these methods to a cross-biobank analysis of smoking and cancer subtypes.

    Main Methods:

    • Proposed two decentralized sequential estimators: Sequential Pseudo-likelihood (SPL) and Sequential Augmented Inverse Probability Weighting (SAIPW).
    • Utilized external population-level information to adjust for selection bias and ensure valid variance estimation.
    • Compared SPL and SAIPW against existing methods (SUW, centralized, meta-learning) via simulations and applied them to harmonized data from Michigan Genomics Initiative (MGI) and NIH All of Us (AOU).

    Main Results:

    • Sequential Unweighted (SUW) estimator showed significant bias and poor coverage in simulations.
    • SPL and SAIPW provided unbiased estimates with valid coverage, with SAIPW demonstrating robustness to selection model misspecification.
    • Decentralized methods showed comparable efficiency to centralized approaches, outperforming meta-learning in smaller cohorts.
    • Real-data analysis confirmed strong smoking-cancer associations for lung, bladder, and larynx cancers.

    Conclusions:

    • The developed framework facilitates valid, privacy-enhancing statistical inference across heterogeneous EHR cohorts.
    • Enables scalable, decentralized research leveraging real-world data while preserving individual privacy.
    • Supports robust estimation of disease risk associations in multi-site studies.