Knowledge-guided generative artificial intelligence for automated taxonomy learning from drug labels
View abstract on PubMed
Summary
This summary is machine-generated.Generative AI and real-world evidence (RWE) built a drug indication taxonomy from labels. Large Language Models excel at concept hierarchies but struggle with term relations, a challenge for humans too.
Area Of Science
- Computational linguistics
- Bioinformatics
- Artificial Intelligence in Medicine
Background
- Drug labels contain crucial but unstructured indication information.
- Automating the extraction and organization of this information is essential for drug discovery and pharmacovigilance.
- Existing methods for taxonomy construction are often manual and time-consuming.
Purpose Of The Study
- To develop an automated method for constructing a drug indication taxonomy.
- To leverage generative Artificial Intelligence (AI), specifically Large Language Models (LLMs) like GPT-4, and real-world evidence (RWE) for this task.
- To evaluate the performance of the AI-driven taxonomy against domain expert expectations.
Main Methods
- Extracted 2909 drug indication terms from 46,421 drug labels using GPT-4.
- Iteratively generated indication concepts and inferred subsumption relations using GPT-4 integrated with RWE.
- Constructed a hierarchical drug indication taxonomy.
- Performed quantitative and qualitative evaluations with domain experts for cardiovascular, endocrine, and genitourinary diseases.
Main Results
- Created a drug indication taxonomy with 24 high-level categories and detailed sub-taxonomies (e.g., 242 concepts in the cardiovascular disease sub-taxonomy).
- The taxonomy covers 234 indication terms associated with 189 drugs.
- GPT-4 achieved >0.7 accuracy in determining drug indication hierarchy with good inter-rater reliability.
- Concept-to-term subsumption relation checking showed fair to moderate reliability.
Conclusions
- Generative AI (LLMs) and RWE can successfully create drug indication taxonomies consistent with expert expectations.
- LLMs are adept at deriving concept hierarchies but face challenges in determining concept-to-term subsumption relations in free-text labels.
- The limitations in relation checking mirror difficulties faced by human experts.
Related Concept Videos
During the development of a new pharmaceutical, the manufacturer initially assigns a code name to the drug. Once approved, the drug receives a United States Adopted Name (USAN)—a generic, nonproprietary designation. Upon being listed in the United States Pharmacopeia, this nonproprietary name becomes the drug's official name. Additionally, the manufacturer assigns a proprietary name or trademark, which serves as the brand name under which the drug is marketed. It is worth noting that...
Drug discovery is a multifaceted process involving extensive screening, testing, and optimization of lead compounds to identify potential new drugs for therapeutic use. It combines several approaches, including screening large numbers of natural products, chemical modification of known active molecules, identification of new drug targets, and rational design based on biological mechanisms and drug-receptor structure. These approaches are carried out in both academic research laboratories and...
Drugs can be classified according to their chemical composition or their intended therapeutic application. For instance, anti-infective agents that possess the ability to eliminate pathogens or suppress their growth and reproduction can be grouped based on the organisms they target or their chemical structure. Furthermore, drugs can be divided into prescription, nonprescription, or controlled substances. Prescription medications, such as antibiotics, require oversight from a licensed healthcare...
Phylogenetic trees come in many forms. It matters in which sequence the organisms are arranged from the bottom to the top of the tree, but the branches can rotate at their nodes without altering the information. The lines connecting individual nodes can be straight, angled, or even curved.
The length of the branches can depict time or the relative amount of change among organisms. For instance, the branch length might indicate the number of amino acid changes in the sequence that underlies the...
Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...
Drug design is a dynamic field that involves discovering and developing new medications based on specific biological targets. This process heavily relies on structure-activity relationships (SAR) and quantitative structure-activity relationships (QSAR) to guide the design and optimization of efficient drugs.
SAR studies the intricate relationship between a drug's chemical structure and biological activity. It focuses on understanding how modifications to a drug's structure can influence...

