Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Improving Retrieval-Augmented Generation without Taxonomy-based Error Categorization.

Proceedings of the conference. Association for Computational Linguistics. Meeting·2026

Same author

Orchestrator multi-agent clinical decision support system for secondary headache diagnosis in primary care.

Journal of the American Medical Informatics Association : JAMIA·2026

Same author

Mesh-represented and learning-empowered hologram synthesis for full 3D holographic displays.

Nature communications·2026

Same author

EventTracer: Fast Path Tracing-based Event Stream Rendering.

IEEE transactions on visualization and computer graphics·2026

Same author

A multi-agent large language model framework to automatically assess performance of a clinical AI Triage tool.

npj health systems·2026

Same author

A universal foundation model for grounded biomedical image interpretation.

Nature communications·2026

Same journal

LabSage: Structural-Semantic Decoupling for Enhanced Retrieval-Augmented Generation in Clinical Laboratories.

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science·2026

Same journal

Evaluating Representation Embeddings from LLMs and Time-Series Foundation Models for Wearable Accelerometer-Based Health Prediction.

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science·2026

Same journal

ClinNoteAgents: An LLM Multi-Agent System for Predicting and Interpreting Heart Failure 30-Day Readmission from Clinical Notes.

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science·2026

Same journal

Mapping the Storm: Linking Tornado Paths to Emergency Room Surges Through Geocoded Patient Data.

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science·2026

Same journal

Multi-Modal Deep Learning-Based Model to Predict Burkitt Lymphoma Recurrence.

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science·2026

Same journal

A Multi-Model LLM Consensus Framework to Identify EHR-Predictable Eligibility Criteria in NSCLC Immunotherapy Trials.

AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 14, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

DIRI: Adversarial Patient Reidentification with Large Language Models for Evaluating Clinical Text Anonymization.

John X Morris¹, Thomas R Campion¹, Sri Laasya Nutheti¹

¹Cornell Tech, New York, NY.

AMIA Joint Summits on Translational Science Proceedings. AMIA Joint Summits on Translational Science

|June 12, 2025

Summary

This summary is machine-generated.

Current deidentification methods fail to fully protect patient privacy in clinical notes. An adversarial large language model (LLM) approach successfully re-identified 9% of notes, revealing weaknesses in existing tools.

More Related Videos

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Published on: April 14, 2023

A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts

A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts

Published on: September 20, 2018

Related Experiment Videos

Last Updated: Jun 14, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Objectification of Tongue Diagnosis in Traditional Medicine, Data Analysis, and Study Application

Published on: April 14, 2023

A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts

A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts

Published on: September 20, 2018

Area of Science:

Biomedical Informatics
Natural Language Processing
Data Privacy

Background:

Sharing protected health information (PHI) is vital for biomedical research.
Deidentification is crucial for removing PHI from clinical text before data distribution.
Current deidentification methods are often evaluated on limited datasets, potentially overestimating real-world performance.

Purpose of the Study:

To develop and evaluate a novel adversarial method using a large language model (LLM) to re-identify patients from de-identified clinical notes.
To assess the effectiveness of state-of-the-art deidentification tools against a re-identification attack.
To highlight limitations in current deidentification technologies and provide a tool for iterative improvement.

Main Methods:

Developed an adversarial approach using a large language model (LLM) for re-identification.
Introduced a De-Identification/Re-Identification (DIRI) method to evaluate deidentification tool performance.
Tested the method on clinical data from Weill Cornell Medicine anonymized using Philter, BiLSTM-CRF, and ClinicalBERT.

Main Results:

The LLM-based re-identification tool successfully re-identified 9% of clinical notes, even those processed by the most effective deidentification tool (ClinicalBERT).
This demonstrates significant weaknesses in current deidentification technologies.
The DIRI method provides a robust evaluation framework for deidentification tools.

Conclusions:

Existing deidentification technologies exhibit significant vulnerabilities.
The developed LLM-based re-identification method can effectively challenge and expose these weaknesses.
Continuous improvement and novel approaches are necessary to ensure robust patient privacy in biomedical research.