Prompts to Table: Specification and Iterative Refinement for Clinical Information Extraction with Large Language Models
View abstract on PubMed
Summary
This summary is machine-generated.Large language models (LLMs) enable accurate data extraction from pathology reports for kidney tumors. This novel pipeline efficiently structures complex clinical data, improving cancer research.
Area Of Science
- Computational pathology
- Artificial intelligence in medicine
- Natural language processing
Background
- Extracting structured data from unstructured clinical text is challenging.
- Traditional methods face limitations in complex medical domains.
- Pathology reports contain vital information for cancer diagnosis and research.
Purpose Of The Study
- To develop and validate a novel pipeline for accurate information extraction and normalization from unstructured pathology reports using large language models (LLMs).
- To focus initially on kidney tumor reports and demonstrate adaptability to other cancer types.
- To generate analysis-ready tabular data from free-text medical records at scale.
Main Methods
- An end-to-end pipeline leveraging LLMs for information extraction and normalization.
- Flexible prompt templates and direct production of tabular data.
- A human-in-the-loop iterative refinement process guided by an error ontology.
- Validation on 2,297 kidney tumor reports and publicly available breast and prostate cancer reports.
Main Results
- Achieved a macro-averaged F1 score of 0.99 for kidney tumor subtypes and 0.97 for kidney metastasis detection.
- Demonstrated flexibility with multiple LLM backbones.
- Showcased adaptability to new domains (breast and prostate cancer reports).
Conclusions
- LLM-based pipelines offer a highly accurate and efficient solution for extracting structured data from complex clinical text.
- The developed pipeline successfully structures critical information from pathology reports, facilitating analysis and research.
- Emphasizes the importance of task definition, interdisciplinary collaboration, and complexity management in clinical LLM workflows.
Related Concept Videos
Children master language quickly and with relative ease, supported by both biological predisposition and reinforcement. B. F. Skinner (1957) proposed that language is learned through reinforcement, while Noam Chomsky (1965) argued that language acquisition mechanisms are biologically determined.
The critical period for language acquisition suggests that the ability to acquire language is at its peak early in life. As people age, this proficiency decreases. Language development begins very...
Clinical trials are prospective experimental studies conducted on humans to determine the safety and efficacy of treatments, drugs, diet methods, and medical devices. Using statistics in clinical trials enables researchers to derive reasonable and accurate conclusions from the collected data, allowing them to make wise decisions in uncertain situations. In medical research, statistical methods are crucial for preventing errors and bias.
There are four phases in a clinical trial. A phase one...
Clinical development focuses on how the drug will interact with the human body and encompasses four key phases of clinical trials, each serving a specific purpose in assessing the safety and effectiveness of new drugs. These phases overlap and build upon one another. Phase I involves a small group of healthy volunteers (typically 20-80 individuals) or, in cases where significant toxicity is expected, patients with the targeted disease, such as cancer or AIDS. The volunteers are tested for...
Nursing Clinical Information System (NCIS)
A Nursing Clinical Information System (NCIS) is a specialized type of healthcare information system tailored to meet the unique needs of nursing practice. It incorporates the principles of nursing informatics to streamline information management and improve the quality of care delivery.
Critical attributes of NCIS include:
Efficient Information Management: NCIS is designed to manage patient information efficiently, making it easily accessible to...
Preclinical development consists of a series of tests that ensure the safety and efficacy of a new therapeutic compound before it is tested in humans. There are four main phases to this process. First, safety pharmacology tests are conducted to ensure the drug does not produce any acutely harmful effects. These tests examine parameters such as bronchoconstriction, cardiac dysrhythmias, blood pressure changes, and ataxia. Next, preliminary toxicological testing is performed to determine the...

