Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Multimodal LLM vs. Human-Measured Features for AI Predictions of Autism in Home Videos.

Algorithms·2026

Same author

Ensemble Modeling of Multiple Physical Indicators to Dynamically Phenotype Autism Spectrum Disorder.

Algorithms·2026

Same author

Remote Assessment of Parkinson Disease Using Deep Learning on Structured Mouse-Trace Data From Suspected Cases: Machine-Learning Pilot Feasibility Study.

JMIR formative research·2026

Same author

The effect of distributional information on the categorization of unaccusativity.

Journal of child language·2026

Same author

Correlates of Fitness Tracker Ownership and Use in Cancer Survivors: Cross-Sectional Survey.

JMIR cancer·2026

Same author

mHealth technologies in research studying cardiovascular health in cancer: A systematic review.

PLOS digital health·2025

Same journal

Supporting Radiology Resident Education and Clinical Decision-Making With Large Language Models: Comparative Study of Reasoning Models DeepSeek-R1 and ChatGPT-o1.

JMIR AI·2026

Same journal

Patient Perceptions on the Use of Artificial Intelligence in Creating Clinical Research Documents: Survey Study.

JMIR AI·2026

Same journal

Application of Language Models for the Analysis of Adverse Drug Events in Pharmaceutical Research and Development: Scoping Review.

JMIR AI·2026

Same journal

Correction: Deep Learning for Age Estimation and Sex Prediction Using Mandibular-Cropped Cephalometric Images: Comparative Model Development and Validation Study.

JMIR AI·2026

Same journal

AI-Assisted Systematic Literature Review of the Economic Burden of Pneumococcal Disease: Development and Validation Study.

JMIR AI·2026

Same journal

Knowledge-Augmented Large Language Model for Multimodal Electronic Health Record-Based Risk Prediction: Development and Validation Study.

JMIR AI·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 14, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Aiding Large Language Models Using Clinical Scoresheets for Neurobehavioral Diagnostic Classification From Text:

Kaiying Lin¹, Abdur Rasool², Saimourya Surabhi³

¹Institute of Linguistics, Academia Sinica, Taipei, Taiwan.

|October 21, 2025

Summary

This summary is machine-generated.

Large language models (LLMs) show limited diagnostic accuracy in psychiatry and behavioral sciences. Specialized machine learning models outperform current LLMs, indicating a need for advanced prompt engineering for clinical applications.

Keywords:

AI LLM artificial intelligence chatbot classification large language model neurological diagnostics

More Related Videos

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Published on: June 30, 2020

Related Experiment Videos

Last Updated: Jan 14, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Published on: June 30, 2020

Area of Science:

Artificial Intelligence
Psychiatry
Behavioral Sciences

Background:

Large language models (LLMs) demonstrate advanced capabilities but their diagnostic utility in psychiatry is under-explored.
Automated diagnostics in behavioral sciences using LLMs require further investigation.

Purpose of the Study:

Evaluate LLM chatbot diagnostic performance for neuropsychiatric conditions (autism, aphasia, depression).
Compare direct diagnosis vs. code generation prompting strategies with and without clinical scales.
Benchmark LLM performance against traditional machine learning classifiers.

Main Methods:

Tested ChatGPT, Gemini, and Claude models with direct diagnosis and code generation prompting.
Utilized ASDBank, AphasiaBank, and Distress Analysis Interview Corpus datasets.
Assessed performance with and without structured clinical assessment scales, comparing to ML benchmarks.

Main Results:

Clinical scales offered minimal performance improvement across datasets.
LLM performance was inconsistent and generally below existing machine learning benchmarks.
Code generation improved F1-scores for AphasiaBank (up to 86.5%) but direct diagnosis remained low for other datasets.

Conclusions:

Current LLM chatbots, prompted naively, underperform specialized machine learning models in psychiatric diagnostics.
Clinical assessment scales may offer slight improvements, but advanced prompt engineering is crucial for clinical utility.