Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Multimodal LLM vs. Human-Measured Features for AI Predictions of Autism in Home Videos.

Algorithms·2026
Same author

Ensemble Modeling of Multiple Physical Indicators to Dynamically Phenotype Autism Spectrum Disorder.

Algorithms·2026
Same author

Remote Assessment of Parkinson Disease Using Deep Learning on Structured Mouse-Trace Data From Suspected Cases: Machine-Learning Pilot Feasibility Study.

JMIR formative research·2026
Same author

The effect of distributional information on the categorization of unaccusativity.

Journal of child language·2026
Same author

Correlates of Fitness Tracker Ownership and Use in Cancer Survivors: Cross-Sectional Survey.

JMIR cancer·2026
Same author

mHealth technologies in research studying cardiovascular health in cancer: A systematic review.

PLOS digital health·2025
Same journal

Supporting Radiology Resident Education and Clinical Decision-Making With Large Language Models: Comparative Study of Reasoning Models DeepSeek-R1 and ChatGPT-o1.

JMIR AI·2026
Same journal

Patient Perceptions on the Use of Artificial Intelligence in Creating Clinical Research Documents: Survey Study.

JMIR AI·2026
Same journal

Application of Language Models for the Analysis of Adverse Drug Events in Pharmaceutical Research and Development: Scoping Review.

JMIR AI·2026
Same journal

Correction: Deep Learning for Age Estimation and Sex Prediction Using Mandibular-Cropped Cephalometric Images: Comparative Model Development and Validation Study.

JMIR AI·2026
Same journal

AI-Assisted Systematic Literature Review of the Economic Burden of Pneumococcal Disease: Development and Validation Study.

JMIR AI·2026
Same journal

Knowledge-Augmented Large Language Model for Multimodal Electronic Health Record-Based Risk Prediction: Development and Validation Study.

JMIR AI·2026
See all related articles

Related Experiment Video

Updated: Jan 14, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

1.0K

Aiding Large Language Models Using Clinical Scoresheets for Neurobehavioral Diagnostic Classification From Text:

Kaiying Lin1, Abdur Rasool2, Saimourya Surabhi3

  • 1Institute of Linguistics, Academia Sinica, Taipei, Taiwan.

JMIR AI
|October 21, 2025
PubMed
Summary
This summary is machine-generated.

Large language models (LLMs) show limited diagnostic accuracy in psychiatry and behavioral sciences. Specialized machine learning models outperform current LLMs, indicating a need for advanced prompt engineering for clinical applications.

Keywords:
AILLMartificial intelligencechatbotclassificationlarge language modelneurological diagnostics

More Related Videos

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education
09:00

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

1.2K
Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques
08:05

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Published on: June 30, 2020

8.0K

Related Experiment Videos

Last Updated: Jan 14, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

1.0K
Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education
09:00

Author Spotlight: Validation of SICOLE-R for Assessing Cognitive and Reading Skills in Spanish-Speaking Children and Its Role in Personalized Education

Published on: August 16, 2024

1.2K
Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques
08:05

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Published on: June 30, 2020

8.0K

Area of Science:

  • Artificial Intelligence
  • Psychiatry
  • Behavioral Sciences

Background:

  • Large language models (LLMs) demonstrate advanced capabilities but their diagnostic utility in psychiatry is under-explored.
  • Automated diagnostics in behavioral sciences using LLMs require further investigation.

Purpose of the Study:

  • Evaluate LLM chatbot diagnostic performance for neuropsychiatric conditions (autism, aphasia, depression).
  • Compare direct diagnosis vs. code generation prompting strategies with and without clinical scales.
  • Benchmark LLM performance against traditional machine learning classifiers.

Main Methods:

  • Tested ChatGPT, Gemini, and Claude models with direct diagnosis and code generation prompting.
  • Utilized ASDBank, AphasiaBank, and Distress Analysis Interview Corpus datasets.
  • Assessed performance with and without structured clinical assessment scales, comparing to ML benchmarks.

Main Results:

  • Clinical scales offered minimal performance improvement across datasets.
  • LLM performance was inconsistent and generally below existing machine learning benchmarks.
  • Code generation improved F1-scores for AphasiaBank (up to 86.5%) but direct diagnosis remained low for other datasets.

Conclusions:

  • Current LLM chatbots, prompted naively, underperform specialized machine learning models in psychiatric diagnostics.
  • Clinical assessment scales may offer slight improvements, but advanced prompt engineering is crucial for clinical utility.