Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...

Numerical Calculations

Numerical Calculations

In engineering applications, the representation of the numerical value is critical. Presenting or reporting the answer is one of the essential parts of engineering practices. Numerical calculations are performed using handheld calculators or computers since numerically accurate answers are always preferred.
The solution to a problem is obtained using different methods. While manually solving algebraic symbols is one of the most common methods, the graphical method is often preferred. Computers...

Language

Language

Language is a unique communication system that uses words and systematic rules to organize and transmit information. Unlike other forms of communication, which may involve postures, movements, odors, or vocalizations, language relies on symbols and grammar. This makes human communication distinct from that of other species, who also communicate but do not use language in the same way humans do.
Corballis and Suddendorf (2007) and Tomasello and Rakoczy (2003) highlight the role of language in...

Pilot and Numeric Relaying

Pilot and Numeric Relaying

Pilot relaying is a type of differential protection used in power systems. It compares electrical quantities at the terminals of equipment via a communication channel instead of direct relay interconnection. This method is essential for transmission lines where the terminals are far apart, typically up to 80 km for lines with 69 to 115 kV ratings. Four types of communication channels are used for pilot relaying:

Systematic Error: Methodological and Sampling Errors

Systematic Error: Methodological and Sampling Errors

In the case of systematic errors, the sources can be identified, and the errors can be subsequently minimized by addressing these sources. According to the source, systematic errors can be divided into sampling, instrumental, methodological, and personal errors.
Sampling errors originate from improper sampling methods or the wrong sample population. These errors can be minimized by refining the sampling strategy. Defective instruments or faulty calibrations are the sources of instrumental...

Fundamental Attribution Error

Fundamental Attribution Error

According to some social psychologists, people tend to overemphasize internal factors as explanations—or attributions—for the behavior of other people. They tend to assume that the behavior of another person is a trait of that person, and to underestimate the power of the situation on the behavior of others. They tend to fail to recognize when the behavior of another is due to situational variables, and thus to the person’s state. This erroneous assumption is...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Development and Internal Validation of a Side-Specific Nomogram Integrating mpMRI and Biopsy Features to Guide Nerve-Sparing Decision Making in Prostate Cancer with Capsular Contact.

Cancers·2026

Same author

Imaging the Breast Cancer Microenvironment: Toward Interpretable MRI Biomarkers for Treatment Response.

Radiology. Artificial intelligence·2026

Same author

Pre-Imaging Clinical Factors Associated With Cardiac MR Image Quality Using Large Language Model-Enabled Data Extraction.

Journal of magnetic resonance imaging : JMRI·2026

Same author

Shear Wave Elastography for Characterization of Breast Lesions in Clinical Routine.

Journal of ultrasound in medicine : official journal of the American Institute of Ultrasound in Medicine·2026

Same author

Strengthening Exposure to Mental Health and Psychiatry for Medical Undergraduates Through a Combined Well-being and Research Engagement Initiative.

Indian journal of psychological medicine·2025

Same author

Balancing Diagnostic Certainty and Locoregional Recurrence Risk in Stage I Non-Small Cell Lung Cancer.

Radiology·2025

Same journal

Kolmogorov-Arnold Guided Local-Global Attention for Medical Image Classification.

Journal of imaging informatics in medicine·2026

Same journal

Artificial Intelligence-Assisted Inner Ear Computed Tomography Analysis: Radiomics-Based Comparison of Affected and Unaffected Ears in Idiopathic Sudden Sensorineural Hearing Loss.

Journal of imaging informatics in medicine·2026

Same journal

High Adoption, Higher Expectations: A Cross-Sectional Survey of Radiologist Engagement with Artificial Intelligence in the United Arab Emirates.

Journal of imaging informatics in medicine·2026

Same journal

Complex-valued Multi-scale Hybrid Attention Network for Fast MRI via Sparsified Data Learning.

Journal of imaging informatics in medicine·2026

Same journal

Automatic Phase and Sequence Identification in Gd-EOB-DTPA-Enhanced Liver MRI Using Deep Convolutional and Sequential Learning.

Journal of imaging informatics in medicine·2026

Same journal

Ultrasound-Based AI in Predicting Hormone Receptor Status in Breast Cancer: Is "Digital Biopsy" Possible.

Journal of imaging informatics in medicine·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 23, 2026

Examining Bilingual Language Control Using the Stroop Task

Examining Bilingual Language Control Using the Stroop Task

Published on: February 26, 2020

Large Language Models in Radiologic Numerical Tasks: A Thorough Evaluation and Error Analysis.

Ali Nowroozi¹, Masha Bondarenko¹, Adrian Serapio¹

¹Center for Intelligent Imaging, Department of Radiology and Biomedical Imaging, University of California, San Francisco (UCSF), San Francisco, CA, USA.

Journal of Imaging Informatics in Medicine

|January 21, 2026

Summary

This summary is machine-generated.

Large language models (LLMs) were evaluated on radiology numerical tasks. Reinforcement learning (RL) models demonstrated consistent high performance and accuracy, with no mathematical errors found.

Keywords:

Data extraction Large language models Mathematics Numbers Radiology reports Reasoning

More Related Videos

Motor Dual-Tasks for Gait Analysis and Evaluation in Post-Stroke Patients

Motor Dual-Tasks for Gait Analysis and Evaluation in Post-Stroke Patients

Published on: March 11, 2021

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Related Experiment Videos

Last Updated: Jan 23, 2026

Examining Bilingual Language Control Using the Stroop Task

Examining Bilingual Language Control Using the Stroop Task

Published on: February 26, 2020

Motor Dual-Tasks for Gait Analysis and Evaluation in Post-Stroke Patients

Motor Dual-Tasks for Gait Analysis and Evaluation in Post-Stroke Patients

Published on: March 11, 2021

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Area of Science:

Medical Imaging and Artificial Intelligence
Natural Language Processing in Healthcare

Background:

Large language models (LLMs) show promise in processing clinical text.
Evaluating LLM performance in specific medical domains like radiology is crucial.

Purpose of the Study:

To assess the performance of various LLMs on radiology numerical extraction and judgment tasks.
To conduct a detailed error analysis of LLM outputs in these tasks.

Main Methods:

Six radiology tasks were defined: three extraction (T-score, CBD diameter, lung nodule size) and three judgment (PET hypermetabolism, osteoporosis, CBD dilation).
LLMs evaluated included Llama 3.1 8b, DeepSeek R1 distilled Llama 8b, OpenAI o1-mini, and OpenAI GPT-5-mini, using data from MIMIC III and institutional databases.
Manual review and error analysis were performed on all incorrect LLM outputs.

Main Results:

For extraction tasks, non-RL models (o1-mini, GPT-5-mini) achieved >95% accuracy, while Llama showed variability (86%-98.7%).
In judgment tasks, o1-mini and GPT-5-mini achieved accuracies of 91.7% and 99.0% respectively, with 100% accuracy in osteoporosis detection.
No mathematical errors were found in o1-mini and GPT-5-mini outputs. Answer-only format negatively impacted Llama and DeepSeek distilled Llama performance.

Conclusions:

Reinforcement learning (RL) reasoning LLMs exhibit consistent high performance and accuracy in radiology numerical tasks, with no mathematical errors.
Non-RL models can also achieve acceptable performance, depending on the specific task complexity.