Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Improving Translational Accuracy02:07

Improving Translational Accuracy

11.5K
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
11.5K
Improving Translational Accuracy02:07

Improving Translational Accuracy

2.6K
2.6K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

The Effects of Syndecan on Osteoblastic Cell Adhesion Onto Nano-Zirconia Surface.

International journal of nanomedicine·2020
Same author

NCBI Taxonomy: a comprehensive update on curation, resources and tools.

Database : the journal of biological databases and curation·2020
Same author

Multiple drug allergies: Recommendations for perioperative management.

Best practice & research. Clinical anaesthesiology·2020
Same author

Osteocyte-derived exosomes induced by mechanical strain promote human periodontal ligament stem cell proliferation and osteogenic differentiation via the miR-181b-5p/PTEN/AKT signaling pathway.

Stem cell research & therapy·2020
Same author

Chiral differentiation of l- and d-penicillamine by β-cyclodextrin: Investigated by IRMPD spectroscopy and theoretical simulations.

Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy·2020
Same author

Studies on reproductive strategies of <i>Vitex negundo</i> L. var. <i>heterophylla</i> (Franch.) Rehder (Lamiaceae) based on morphological characteristics and SSR markers.

Ecology and evolution·2020
Same journal

CoAff-DTI: Fine-grained drug-target interaction prediction using pre-trained language models and affinity-guided mechanisms.

Journal of biomedical informatics·2026
Same journal

Evaluation of temporal preservation in synthetic longitudinal patient data.

Journal of biomedical informatics·2026
Same journal

ARKE: An ontology-driven framework for automated mapping of local radiology procedure terms to the LOINC-RadLex playbook using large language model.

Journal of biomedical informatics·2026
Same journal

A validation-driven training controller for cross-lingual biomedical NER via reinforcement learning-based adaptive loss weighting.

Journal of biomedical informatics·2026
Same journal

ASP-HR: An Adaptive Spatial Perception and Hierarchical Reasoning mechanism for document-level biomedical relation extraction.

Journal of biomedical informatics·2026
Same journal

Beyond Accuracy: Safety-Centered guidelines for the evaluation of LLM-based therapy recommendation systems for chronic multimorbidity patients.

Journal of biomedical informatics·2026
See all related articles

Related Experiment Video

Updated: Apr 26, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

1.3K

High-quality data selection-driven instruction tuning for biomedical large language models.

Jieqiong Zheng1, Lu Sun1, Xinyu He2

  • 1Dalian Neusoft University of Information, No. 8 Software Garden Road, Ganjingzi District, Dalian, 116000, China.

Journal of Biomedical Informatics
|April 24, 2026
PubMed
Summary
This summary is machine-generated.

This study introduces a novel data selection framework to improve large language model (LLM) training for biomedical natural language processing (NLP) tasks. The BiomedicalLLM model, trained using this framework, achieved a 3.3% average F1-score gain.

Keywords:
Biomedical instruction datasetData qualityData selectionLLMNatural language processing

More Related Videos

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

1.6K
Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

1.9K

Related Experiment Videos

Last Updated: Apr 26, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

1.3K
Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

1.6K
Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

1.9K

Area of Science:

  • Biomedical Natural Language Processing (NLP)
  • Machine Learning
  • Artificial Intelligence

Background:

  • Training large language models (LLMs) for biomedical NLP tasks is computationally intensive and requires high-quality data.
  • Existing data selection methods may not optimally adapt to the diverse challenges within biomedical NLP, such as named entity recognition (NER), relation extraction (RE), event extraction (EE), and text classification (TXTCLASS).
  • Efficient training strategies are crucial for advancing clinical and research applications of LLMs in the biomedical domain.

Purpose of the Study:

  • To present a novel data selection framework that enhances the training efficiency of LLMs for critical biomedical NLP tasks.
  • To introduce and validate the Data Selection (DS) score as a metric for quantifying instructional context's impact on response generation.
  • To develop and evaluate a fine-tuned LLM, BiomedicalLLM, using the proposed data selection methodology.

Main Methods:

  • Developed a Data Selection (DS) score to measure the influence of instructional context on model response losses.
  • Employed the DS method to filter high-quality data from biomedical datasets for specific NLP tasks (NER, RE, EE, TXTCLASS).
  • Fine-tuned a base LLM on the selected dataset, resulting in the BiomedicalLLM model, and conducted experiments and ablation studies.

Main Results:

  • The BiomedicalLLM model, trained with the DS framework, achieved an average F1-score improvement of 3.3% across tasks compared to baseline methods.
  • Ablation studies confirmed the overall effectiveness of the proposed data selection framework.
  • Analysis showed that the DS method dynamically adjusts sample selection based on task characteristics, optimizing resource allocation for improved diversity and representation.

Conclusions:

  • The novel data selection framework significantly enhances LLM training efficiency and performance in biomedical NLP.
  • The DS score provides a valuable metric for data quality assessment and selection in instruction-based LLM training.
  • This approach offers a transformative strategy for developing advanced LLMs for clinical practice and biomedical research, with the model available open-source.