Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Synthetic Biology02:55

Synthetic Biology

4.7K
Synthetic biology is an interdisciplinary science that involves using principles from disciplines such as engineering, molecular biology, cell biology, and systems biology. It involves remodeling existing organisms from nature or constructing completely new synthetic organisms for applications such as protein or enzyme production, bioremediation, value-added macromolecule production, and the addition of desirable traits to crops, to name a few.
Golden rice
Golden rice is a genetically modified...
4.7K
lncRNA - Long Non-coding RNAs02:39

lncRNA - Long Non-coding RNAs

2.8K
2.8K
RNA-seq03:21

RNA-seq

9.7K
RNA sequencing, or RNA-Seq, is a high-throughput sequencing technology used to study the transcriptome of a cell. Transcriptomics helps to interpret the functional elements of a genome and identify the molecular constituents of an organism. Additionally, it also helps in understanding the development of an organism and the occurrence of diseases. 
Before the discovery of RNA-seq, microarray-based methods and Sanger sequencing were used for transcriptome analysis. However, while...
9.7K
Improving Translational Accuracy02:07

Improving Translational Accuracy

8.5K
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
8.5K
Drug Nomenclature01:17

Drug Nomenclature

1.5K
During the development of a new pharmaceutical, the manufacturer initially assigns a code name to the drug. Once approved, the drug receives a United States Adopted Name (USAN)—a generic, nonproprietary designation. Upon being listed in the United States Pharmacopeia, this nonproprietary name becomes the drug's official name. Additionally, the manufacturer assigns a proprietary name or trademark, which serves as the brand name under which the drug is marketed. It is worth noting that...
1.5K
Genetic Lingo01:11

Genetic Lingo

98.9K
Overview
98.9K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Extracting and Classifying Drug Discontinuations From Estonian Electronic Health Records: Development and Validation Study.

Journal of medical Internet research·2026
Same author

A Comprehensive Approach to Days' Supply Estimation in a Real-World Prescription Database: Algorithm Development and Validation Study.

Online journal of public health informatics·2026
Same author

Real-world treatment trajectories of adults with newly diagnosed asthma or COPD.

BMJ open respiratory research·2024
Same journal

American Medical Association Shares Framework to Address the Escalating Risk of Physician Deepfakes.

Journal of medical Internet research·2026
Same journal

Online Social Interaction, Neighborhood Perception, and the Mediating Role of Social Capital in Charitable Giving for Seriously Ill Patients: Cross-Sectional Study.

Journal of medical Internet research·2026
Same journal

Evaluation of Large Language Models for Structured Data Extraction From Interstitial Lung Disease Clinical Notes: Comparative Study.

Journal of medical Internet research·2026
Same journal

Digital Interventions Targeting Parents to Improve Early Childhood Movement, Nutrition, and Sleep Behaviors: Systematic Review.

Journal of medical Internet research·2026
Same journal

Physical Activity Interventions Using Digital Health Interventions for Cancer-Related Fatigue in People With a History of Cancer: Scoping Review.

Journal of medical Internet research·2026
Same journal

Effectiveness of a Home-Based and Group-Based Tele-Exercise Program for Breast Cancer Survivors: Pilot Randomized Controlled Trial.

Journal of medical Internet research·2026
See all related articles

Related Experiment Video

Updated: May 21, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

474

Using Synthetic Health Care Data to Leverage Large Language Models for Named Entity Recognition: Development and

Hendrik Šuvalov1, Mihkel Lepson1, Veronika Kukk1

  • 1Institute of Computer Science, University of Tartu, Tartu, Estonia.

Journal of Medical Internet Research
|March 18, 2025
PubMed
Summary
This summary is machine-generated.

This study introduces a novel method for creating Estonian medical named entity recognition (NER) models using synthetic data generated by large language models (LLMs). This approach overcomes data scarcity for low-resource languages while preserving patient privacy.

Keywords:
EstonianLLMNERNLPannotated dataartificial intelligenceclinical decision supportdata annotationdata mininghealth care datalanguage modellarge language modelmachine learningmedical entitynamed entity recognitionnatural language processingsynthetic data

More Related Videos

A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts
07:50

A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts

Published on: September 20, 2018

15.7K
Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications
09:20

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Published on: February 23, 2019

8.6K

Related Experiment Videos

Last Updated: May 21, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

474
A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts
07:50

A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts

Published on: September 20, 2018

15.7K
Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications
09:20

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Published on: February 23, 2019

8.6K

Area of Science:

  • Natural Language Processing
  • Medical Informatics
  • Computational Linguistics

Background:

  • Named Entity Recognition (NER) is crucial for extracting medical information from health records.
  • Developing NER for low-resource languages like Estonian is challenging due to limited annotated data.
  • Large Language Models (LLMs) show promise for text understanding across languages and domains.

Purpose of the Study:

  • To develop medical NER models for Estonian, a low-resource language.
  • To generate synthetic Estonian health data using LLMs for NER model training.
  • To preserve patient data privacy by avoiding real-world annotated data.

Main Methods:

  • A three-step pipeline: synthetic data generation (GPT-2), LLM annotation (GPT-3.5-Turbo, GPT-4), and NER model fine-tuning.
  • Comparison of different LLM prompts and models (GPT-3.5-Turbo, GPT-4, local LLM).
  • Exploration of the impact of synthetic data volume on NER model performance.

Main Results:

  • The methodology shows potential for extracting medical entities from real-world texts.
  • The best setup achieved an F1-score of 0.69 for drug extraction and 0.38 for procedure extraction.
  • Performance varies across entity types, with procedures being more complex to extract.

Conclusions:

  • LLMs can be effectively leveraged with synthetic data to train NER models, preserving patient privacy.
  • This approach offers a promising solution for developing NER models in low-resource languages like Estonian.
  • Future work will focus on refining synthetic data generation and expanding applicability to other domains and languages.