Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Genome Annotation and Assembly

Genome Annotation and Assembly

The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.

Genomics

Genomics

Genomics is the science of genomes: it is the study of all the genetic material of an organism. In humans, the genome consists of information carried in 23 pairs of chromosomes in the nucleus, as well as mitochondrial DNA. In genomics, both coding and non-coding DNA is sequenced and analyzed. Genomics allows a better understanding of all living things, their evolution, and their diversity. It has a myriad of uses: for example, to build phylogenetic trees, to improve productivity and...

Leaky Scanning

Leaky Scanning

During most eukaryotic translation processes, the small 40S ribosome subunit scans an mRNA from its 5' end until it encounters the first start AUG codon. The large 60S ribosomal subunit then joins the smaller one to initiate protein synthesis. The location of the translation initiation is largely determined by the nucleotides near the start codon as there may be multiple translation initiation sites present on the mRNA. Marilyn Kozak discovered that the sequence RCCAUGG (where R...

Genome Size and the Evolution of New Genes

Genome Size and the Evolution of New Genes

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Synthesized annotation guidelines are knowledge-lite boosters for clinical information extraction.

Journal of the American Medical Informatics Association : JAMIA·2026

Same author

Education Research: Integration of Trainee and Faculty Clinics at an Academic Medical Center: Improving Quality of Care and Education.

Neurology. Education·2026

Same author

Hootation: A GUI and API library for ontology validation and verbalization.

Proceedings. IEEE International Conference on Semantic Computing·2026

Same author

Impact of Prescribed and Self-Selected Music Interventions on Stress, Sleep, Heart Rate Variability, and Brain Connectivity in Surgeons Using 7-Tesla Functional Magnetic Resonance Imaging and Wearable Actigraphy: Multimodal Feasibility Randomized Controlled Trial.

JMIR formative research·2026

Same author

Clinical document metadata extraction: A scoping review.

Journal of biomedical informatics·2026

Same author

Facilitating Clinical Information Extraction with Synthetic Data and Ontology using Large Language Models.

AMIA ... Annual Symposium proceedings. AMIA Symposium·2026

Same journal

DataAtlas: automatic generation of data dictionaries using large language models.

JAMIA open·2026

Same journal

An examination of the availability and characteristics of social needs data in the electronic health records: a path to social data harmonization and standardization at Johns Hopkins medicine.

JAMIA open·2026

Same journal

Generative artificial intelligence implementation in REDCap.

JAMIA open·2026

Same journal

Improving readability of layperson abstracts and summaries in oncology using task-specific large language model powered tool: results from the BRIDGE-AI 7 study.

JAMIA open·2026

Same journal

Accuracy of administrative data in ascertaining health conditions: a systematic review.

JAMIA open·2026

Same journal

Building a consumer health informatics introductory course consensus curriculum: an eDelphi study.

JAMIA open·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 22, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

LLM-IE: a python package for biomedical generative information extraction with large language models.

Enshuo Hsu^1,2, Kirk Roberts¹

¹McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, United States.

|March 13, 2025

Summary

This summary is machine-generated.

A new Python package, LLM-IE, simplifies biomedical information extraction using large language models (LLMs). It offers tools for prompt engineering and building extraction pipelines, achieving over 70% F1 for entity extraction.

Keywords:

information extraction large language models named entity recognition natural language processing relation extraction

More Related Videos

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

High-Throughput Transcriptome Analysis for Investigating Host-Pathogen Interactions

High-Throughput Transcriptome Analysis for Investigating Host-Pathogen Interactions

Published on: March 5, 2022

Related Experiment Videos

Last Updated: May 22, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

High-Throughput Transcriptome Analysis for Investigating Host-Pathogen Interactions

High-Throughput Transcriptome Analysis for Investigating Host-Pathogen Interactions

Published on: March 5, 2022

Area of Science:

Biomedical Informatics
Natural Language Processing

Background:

Large language models (LLMs) show promise for biomedical information extraction (IE).
Existing challenges in prompt engineering and algorithm development limit LLM application in IE.
There is a lack of dedicated software for creating comprehensive IE pipelines.

Purpose of the Study:

To develop a user-friendly Python package, LLM-IE, for constructing end-to-end biomedical information extraction pipelines.
To address the persistent challenges in prompt engineering and algorithm design for LLM-based IE.
To provide essential building blocks for robust and efficient IE system development.

Main Methods:

Developed LLM-IE, a Python package supporting named entity recognition, entity attribute extraction, and relation extraction.
Implemented an interactive LLM agent for schema definition and prompt design.
Utilized state-of-the-art prompting algorithms and visualization features.
Benchmarked LLM-IE performance on the i2b2 clinical datasets.

Main Results:

The sentence-based prompting algorithm achieved over 70% strict F1 for entity extraction in an 8-shot setting.
The system demonstrated approximately 60% F1 for entity attribute extraction.
LLM-IE successfully supports key IE tasks including NER, entity attribute extraction, and relation extraction.

Conclusions:

LLM-IE provides a foundational toolkit for developing advanced biomedical information extraction pipelines.
The package facilitates schema definition, prompt design, and utilizes effective prompting algorithms.
Future work will focus on expanding LLM-IE's capabilities and enhancing its computational efficiency.