Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Health Literacy

Health Literacy

Health literacy is an individual's or a community's capacity to comprehend, receive, read, and use relevant healthcare information and services. The World Health Organization (WHO, 2018) defines health literacy as the cognitive and social skills that determine the ability of individuals to gain access to, understand, and use information in ways that promote and maintain good health. As a result, the WHO helps individuals manage long-term health concerns, participate in preventative...

Improving Translational Accuracy

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...

Improving Translational Accuracy

Improving Translational Accuracy

Models of Health Promotion and Illness Prevention I

Models of Health Promotion and Illness Prevention I

A model is a theoretical way to understand a concept or an idea. Models can overcome barriers to health regardless of diverse economic and cultural backgrounds. In addition, models make the task easier by providing different ways to approach complex issues. There are two major health promotion models: the health belief model and the health promotion model.
The health belief model (HBM) attempts to predict health-related behavior in specific belief patterns. According to the HBM, a person's...

Language Development

Language Development

Children master language quickly and with relative ease, supported by both biological predisposition and reinforcement. B. F. Skinner (1957) proposed that language is learned through reinforcement, while Noam Chomsky (1965) argued that language acquisition mechanisms are biologically determined.
The critical period for language acquisition suggests that the ability to acquire language is at its peak early in life. As people age, this proficiency decreases. Language development begins very...

Models of Health Promotion and Illness Prevention II

Models of Health Promotion and Illness Prevention II

The person's health status fluctuates continually, varying from being in good health to becoming ill and returning to being healthy. To understand the concept of illness prevention, there are two models. First, the health-illness continuum model is a graphic representation of an individual's wellness. It states that a person is considered healthy in the absence of physical disease and the presence of good emotional health.
The agent-host-environment model states that disease results...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Methodologic feasibility of comprehensive biomarker collection in a pilot trial of a novel psychotherapy for trauma-related nightmares.

Acta psychologica·2026

Same author

Passive heart-rate monitoring during smartphone use in everyday life.

Nature·2026

Same author

Use of Continuous Glucose Monitoring With Machine Learning to Identify Metabolic Subphenotypes and Inform Precision Lifestyle Changes.

Journal of diabetes science and technology·2026

Same author

Insulin resistance prediction from wearables and routine blood biomarkers.

Nature·2026

Same author

Evaluating the performance and acceptability of an artificial intelligence digital health exercise platform in females with and without axial spondyloarthritis: a phase I study.

Advances in rheumatology (London, England)·2026

Same author

Differential sensitivity of impedance plethysmography and photoplethysmography sensors to temperature-induced peripheral vasoconstriction.

Scientific reports·2026

Same journal

Enhancing anatomical recognition by surgeons during pelvic lymph node dissection using artificial intelligence.

NPJ digital medicine·2026

Same journal

AFP assistant: a retrieval-augmented generation and large language model-powered multilingual polio chatbot for low-resource language communities.

NPJ digital medicine·2026

Same journal

Structured reasoning failures compromise LLM interpretation of clinical oncology notes.

NPJ digital medicine·2026

Same journal

Translation of frozen sections into FFPE images for skin cancer resection margins using generative AI.

NPJ digital medicine·2026

Same journal

FedFound: a federated foundation model for lifespan brain morphological connectome analysis.

NPJ digital medicine·2026

Same journal

A multimodal instruction dataset and benchmark for ultrasound understanding.

NPJ digital medicine·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Mar 1, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

A scalable framework for evaluating health language models.

Neil Mallinar¹, A Ali Heydari¹, Xin Liu¹

¹Google Research, Mountain View, CA, USA.

NPJ Digital Medicine

|February 27, 2026

Summary

This summary is machine-generated.

Adaptive Precise Boolean rubrics offer a faster, more reliable way to evaluate large language models (LLMs) in healthcare. This new framework improves accuracy and efficiency for assessing LLM-generated health information.

More Related Videos

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

Asthma Detection Research Based on Voice Signal Processing and Machine Learning

Asthma Detection Research Based on Voice Signal Processing and Machine Learning

Published on: July 22, 2025

Related Experiment Videos

Last Updated: Mar 1, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

Asthma Detection Research Based on Voice Signal Processing and Machine Learning

Asthma Detection Research Based on Voice Signal Processing and Machine Learning

Published on: July 22, 2025

Area of Science:

Artificial Intelligence
Computational Biology
Health Informatics

Background:

Large language models (LLMs) show promise in generating personalized health insights from patient data.
Current LLM evaluation in healthcare relies on human experts, which is slow, costly, and prone to bias.
Efficient and rigorous evaluation methods are needed to ensure the safety and accuracy of LLM health applications.

Purpose of the Study:

To introduce Adaptive Precise Boolean rubrics, a novel framework for evaluating open-ended LLM responses.
To streamline both human and automated evaluation processes for LLM-generated health content.
To improve the scalability and cost-effectiveness of LLM assessment in complex medical domains.

Main Methods:

Developed an evaluation framework using a minimal set of targeted Boolean questions to identify response gaps.
Contrasted complex evaluation targets with precise, granular targets answerable with simple Boolean responses.
Validated the framework in metabolic health, covering diabetes, cardiovascular disease, and obesity.

Main Results:

Adaptive Precise Boolean rubrics achieved substantially higher inter-rater agreement compared to Likert scales for both expert and non-expert evaluators.
The framework required approximately half the evaluation time of traditional Likert-based methods.
Demonstrated improved efficiency and scalability, particularly with automated and non-expert assessments.

Conclusions:

Adaptive Precise Boolean rubrics offer a more efficient and reliable method for evaluating LLMs in healthcare.
The framework enhances scalability and cost-effectiveness, enabling broader adoption of LLM health applications.
This approach facilitates more extensive and rigorous assessment of LLM-generated medical information.