Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Synthetic Biology

Synthetic Biology

Synthetic biology is an interdisciplinary science that involves using principles from disciplines such as engineering, molecular biology, cell biology, and systems biology. It involves remodeling existing organisms from nature or constructing completely new synthetic organisms for applications such as protein or enzyme production, bioremediation, value-added macromolecule production, and the addition of desirable traits to crops, to name a few.
Golden rice
Golden rice is a genetically modified...

lncRNA - Long Non-coding RNAs

lncRNA - Long Non-coding RNAs

RNA-seq

RNA-seq

RNA sequencing, or RNA-Seq, is a high-throughput sequencing technology used to study the transcriptome of a cell. Transcriptomics helps to interpret the functional elements of a genome and identify the molecular constituents of an organism. Additionally, it also helps in understanding the development of an organism and the occurrence of diseases.
Before the discovery of RNA-seq, microarray-based methods and Sanger sequencing were used for transcriptome analysis. However, while...

Improving Translational Accuracy

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...

Drug Nomenclature

Drug Nomenclature

During the development of a new pharmaceutical, the manufacturer initially assigns a code name to the drug. Once approved, the drug receives a United States Adopted Name (USAN)—a generic, nonproprietary designation. Upon being listed in the United States Pharmacopeia, this nonproprietary name becomes the drug's official name. Additionally, the manufacturer assigns a proprietary name or trademark, which serves as the brand name under which the drug is marketed. It is worth noting that...

Genetic Lingo

Genetic Lingo

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Extracting and Classifying Drug Discontinuations From Estonian Electronic Health Records: Development and Validation Study.

Journal of medical Internet research·2026

Same author

A Comprehensive Approach to Days' Supply Estimation in a Real-World Prescription Database: Algorithm Development and Validation Study.

Online journal of public health informatics·2026

Same author

Real-world treatment trajectories of adults with newly diagnosed asthma or COPD.

BMJ open respiratory research·2024

Same journal

American Medical Association Shares Framework to Address the Escalating Risk of Physician Deepfakes.

Journal of medical Internet research·2026

Same journal

Online Social Interaction, Neighborhood Perception, and the Mediating Role of Social Capital in Charitable Giving for Seriously Ill Patients: Cross-Sectional Study.

Journal of medical Internet research·2026

Same journal

Evaluation of Large Language Models for Structured Data Extraction From Interstitial Lung Disease Clinical Notes: Comparative Study.

Journal of medical Internet research·2026

Same journal

Digital Interventions Targeting Parents to Improve Early Childhood Movement, Nutrition, and Sleep Behaviors: Systematic Review.

Journal of medical Internet research·2026

Same journal

Physical Activity Interventions Using Digital Health Interventions for Cancer-Related Fatigue in People With a History of Cancer: Scoping Review.

Journal of medical Internet research·2026

Same journal

Effectiveness of a Home-Based and Group-Based Tele-Exercise Program for Breast Cancer Survivors: Pilot Randomized Controlled Trial.

Journal of medical Internet research·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 21, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Using Synthetic Health Care Data to Leverage Large Language Models for Named Entity Recognition: Development and

Hendrik Šuvalov¹, Mihkel Lepson¹, Veronika Kukk¹

¹Institute of Computer Science, University of Tartu, Tartu, Estonia.

Journal of Medical Internet Research

|March 18, 2025

Summary

This summary is machine-generated.

This study introduces a novel method for creating Estonian medical named entity recognition (NER) models using synthetic data generated by large language models (LLMs). This approach overcomes data scarcity for low-resource languages while preserving patient privacy.

Keywords:

Estonian LLM NER NLP annotated data artificial intelligence clinical decision support data annotation data mining health care data language model large language model machine learning medical entity named entity recognition natural language processing synthetic data

More Related Videos

A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts

A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts

Published on: September 20, 2018

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Published on: February 23, 2019

Related Experiment Videos

Last Updated: May 21, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts

A Metadata Extraction Approach for Clinical Case Reports to Enable Advanced Understanding of Biomedical Concepts

Published on: September 20, 2018

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Cloud-Based Phrase Mining and Analysis of User-Defined Phrase-Category Association in Biomedical Publications

Published on: February 23, 2019

Area of Science:

Natural Language Processing
Medical Informatics
Computational Linguistics

Background:

Named Entity Recognition (NER) is crucial for extracting medical information from health records.
Developing NER for low-resource languages like Estonian is challenging due to limited annotated data.
Large Language Models (LLMs) show promise for text understanding across languages and domains.

Purpose of the Study:

To develop medical NER models for Estonian, a low-resource language.
To generate synthetic Estonian health data using LLMs for NER model training.
To preserve patient data privacy by avoiding real-world annotated data.

Main Methods:

A three-step pipeline: synthetic data generation (GPT-2), LLM annotation (GPT-3.5-Turbo, GPT-4), and NER model fine-tuning.
Comparison of different LLM prompts and models (GPT-3.5-Turbo, GPT-4, local LLM).
Exploration of the impact of synthetic data volume on NER model performance.

Main Results:

The methodology shows potential for extracting medical entities from real-world texts.
The best setup achieved an F1-score of 0.69 for drug extraction and 0.38 for procedure extraction.
Performance varies across entity types, with procedures being more complex to extract.

Conclusions:

LLMs can be effectively leveraged with synthetic data to train NER models, preserving patient privacy.
This approach offers a promising solution for developing NER models in low-resource languages like Estonian.
Future work will focus on refining synthetic data generation and expanding applicability to other domains and languages.