Improving Automated Deep Phenotyping Through Large Language Models Using Retrieval Augmented Generation
View abstract on PubMed
Summary
This summary is machine-generated.RAG-HPO improves rare genetic disorder diagnosis by accurately assigning Human Phenotype Ontology (HPO) terms using retrieval-augmented generation, surpassing existing tools.
Area Of Science
- Computational biology and bioinformatics
- Genomics and genetic disorder research
- Natural Language Processing (NLP) in clinical settings
Background
- Accurate diagnosis of rare genetic disorders requires precise phenotypic and genotypic analysis.
- Human Phenotype Ontology (HPO) provides a standardized language for clinical phenotypes.
- Existing HPO tools (Doc2HPO, ClinPhen) struggle with incomplete assignments and require manual review; LLMs are prone to hallucinations.
Purpose Of The Study
- To present RAG-HPO, a novel Python-based tool leveraging Retrieval-Augmented Generation (RAG) for accurate HPO term assignment.
- To enhance LLM accuracy in HPO term extraction without fine-tuning, addressing limitations of current methods.
Main Methods
- RAG-HPO utilizes a dynamic vector database containing over 54,000 phenotypic phrases mapped to HPO IDs.
- The workflow involves LLM extraction of phenotypic phrases, semantic similarity matching against the vector database, and LLM-based HPO term assignment.
- Performance was benchmarked against Doc2HPO, ClinPhen, and FastHPOCR using 120 case reports with 1,792 manually assigned HPO terms.
Main Results
- RAG-HPO, powered by Llama-3 70B, achieved a mean precision of 0.84, recall of 0.78, and F1 score of 0.80 on 120 case reports.
- These results significantly surpassed conventional tools (p<0.00001).
- False positive HPO term identification was low (15.8%), with minimal hallucinations (2.7%).
Conclusions
- RAG-HPO is a user-friendly, adaptable tool that significantly outperforms standard HPO-matching tools.
- Its enhanced precision and recall accelerate the identification of genetic mechanisms underlying rare diseases.
- RAG-HPO represents a substantial advancement in phenotypic analysis for genetic research and clinical genomics.
Related Concept Videos
6.4K
Although the genetic makeup of an organism plays a major role in determining the phenotype, there are also several environmental factors, such as temperature, oxygen availability, presence of mutagens, that can alter an organism’s phenotype.
An example of how genetic background affects phenotype can be seen in horses. The Extension gene in horses is responsible for their coat color. A wild-type gene (EE) produces black pigment in the coat, while a mutant gene (ee) produces red pigment. A...
2.5K

