Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Improving Translational Accuracy02:07

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
Improving Translational Accuracy02:07

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Bridging survival analysis and machine learning to improve healthy life expectancy estimation using PHR records.

NPJ digital medicine·2026
Same author

Integrating new habits and practices data and homecare products into the Creme RIFM aggregate exposure model.

Regulatory toxicology and pharmacology : RTP·2026
Same author

Analysis of inter-brain synchrony in group-based electroencephalography to assess task-dependent interactions.

Frontiers in neuroergonomics·2026
Same author

Comprehensive representation of health-related phenotypes in one million dogs using topic modelling of electronic health records.

Journal of big data·2026
Same author

Multimodal models for skin cancer classification using clinical freetext and dermatoscopic images.

Communications medicine·2026
Same author

Views of Facial Attractiveness of Faces of Individuals With and Without an Intellectual Disability.

Journal of applied research in intellectual disabilities : JARID·2026
Same journal

Circulating monocyte gene expression profiles associated with cardiac remodeling and incident heart failure in the Multi-Ethnic Study of Atherosclerosis.

Communications medicine·2026
Same journal

Impact of methicillin resistance on mortality in Staphylococcus aureus endocarditis: a systematic review and meta-analysis.

Communications medicine·2026
Same journal

Clinical benefits of tirzepatide in patients with steatotic liver disease and cardiometabolic dysfunction.

Communications medicine·2026
Same journal

Neuropsychiatric association of tirzepatide and semaglutide in obesity with and without type 2 diabetes.

Communications medicine·2026
Same journal

Systematic surveillance of Carbapenemase-producing Enterobacterales reveals persistent spread of IMP-4 IncM2 plasmids in New Caledonia.

Communications medicine·2026
Same journal

Machine learning classification and regional differentiation of neuropathologically-confirmed Alzheimer's disease and comorbid Lewy body disease.

Communications medicine·2026
See all related articles

Related Experiment Video

Updated: Jun 16, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Generalizable multilingual medical text anonymization using generative instruction tuning.

Chenghao Xiao1, G Thomas Hudson2,3, Matthew Watson1

  • 1Department of Computer Science, Durham University, Durham, UK.

Communications Medicine
|June 13, 2026
PubMed
Summary
This summary is machine-generated.

This study introduces an annotation-free framework for privacy-preserving medical text anonymization using generative large language models (LLMs). The approach effectively removes sensitive data while preserving clinical meaning across diverse medical domains and languages.

Related Experiment Videos

Last Updated: Jun 16, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Area of Science:

  • Medical Informatics
  • Natural Language Processing
  • Data Privacy

Background:

  • High-quality medical data is crucial for research but contains sensitive patient information.
  • Current anonymization methods are domain-specific, require manual data, and are difficult to scale.
  • A scalable, privacy-preserving solution is needed for utilizing unstructured clinical text.

Purpose of the Study:

  • To develop a reproducible, annotation-free framework for training and adapting LLM-based medical text anonymization models.
  • To enable privacy-preserving use of medical text across diverse settings and languages.
  • To reduce reliance on manual annotation and real patient data.

Main Methods:

  • Developed a generative medical anonymization model using synthetic data and instruction tuning of generative LLMs.
  • Created an annotation-free framework for training and adapting models.
  • Evaluated performance on synthetic datasets and real-world patient requests, assessing accuracy, recall, precision, and meaning preservation.

Main Results:

  • Generative models trained with the synthetic framework outperformed baseline systems across multiple medical domains.
  • Models achieved high accuracy in anonymizing sensitive information and high fidelity in preserving non-sensitive text.
  • The framework demonstrated effectiveness with small datasets, generalization to unseen fields, and multilingual support without additional training.

Conclusions:

  • The study presents a reproducible, annotation-free approach for effective medical text anonymization.
  • This framework reduces the need for real patient data and lowers adaptation costs.
  • It facilitates broader use of unstructured clinical information for research and service improvement.