Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Video

Updated: Jan 13, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Automated Safety Plan Scoring in Outpatient Mental Health Settings Using Large Language Models: Exploratory Study.

Hayoung K Donnelly^1,2, Gregory K Brown¹, Kelly L Green¹

¹Department of Psychiatry, University of Pennsylvania, Philadelphia, PA, United States.

JMIR Mental Health

|January 8, 2026

Summary

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Who Is Getting the Help They Need? An AI-Driven Study of Intersectional Disparities in Mental Health Service Utilization Among Young Adults with Suicidal Ideation.

Prevention science : the official journal of the Society for Prevention Research·2026

Same author

Four-year patterns of a mental health support service for employees of a healthcare system delivered by lay professionals.

Journal of occupational and environmental medicine·2026

Same author

High School Exiting Among Autistic Students: A National Analysis of Special Education Data from 2015 to 2019.

Behavioral sciences (Basel, Switzerland)·2026

Same author

Problem-solving therapy for suicide prevention outcomes in the VA's suicide prevention 2.0 clinical telehealth program.

Journal of behavioral medicine·2026

Same author

Shared Decision Making Interventions for Parents of Children on the Autism Spectrum: A Systematic and Scoping Review.

Community mental health journal·2026

Same author

Data Science Education for Residents, Researchers, and Students in Psychiatry and Psychology: Program Development and Evaluation Study.

JMIR medical education·2026

Same journal

From Alliance to Nexus: Rethinking Digital Therapeutic Relationships.

JMIR mental health·2026

Same journal

Governing Ethical Tensions in Youth Digital Mental Health Research.

JMIR mental health·2026

Same journal

Use of a Conversational Agent for Training Mental Health Professionals in Suicide Safety Planning: Pilot Feasibility and Acceptability Study.

JMIR mental health·2026

Same journal

Coproduction Without Youth? Closing the Participation Gap in Digital Mental Health Research.

JMIR mental health·2026

Same journal

Functional Outcome Prediction in Young Adults With Mental Health Symptoms Using Machine Learning and Large Language Models: Longitudinal Observational Study.

JMIR mental health·2026

Same journal

Using AI to Detect Psychosis Relapse: Scoping Review.

JMIR mental health·2026

See all related articles

This summary is machine-generated.

Automated tools using large language models (LLMs) can assess suicide prevention safety plan quality. LLaMA 3 and o3-mini demonstrated superior performance over GPT-4 in evaluating these crucial mental health plans.

Area of Science:

Artificial Intelligence in Mental Health
Natural Language Processing Applications
Clinical Psychology Research

Background:

The Safety Planning Intervention (SPI) is a vital suicide prevention tool, yielding written plans to mitigate patient suicide risk.
Higher quality safety plans (complete, personalized, specific) are more effective in reducing suicide risk.
Current methods for assessing SPI quality are labor-intensive, limiting clinician feedback.

Purpose of the Study:

To develop an automated tool, the Safety Plan Fidelity Rater, for assessing the quality of written safety plans.
To leverage three distinct large language models (LLMs): GPT-4, LLaMA 3, and o3-mini for quality assessment.

Main Methods:

Utilized 266 deidentified safety plans from New York outpatient mental health settings.

Keywords:

artificial intelligence clinician support generative AI mental health informatics patient-reported data suicide

More Related Videos

Implementation of a Real-Time Psychosis Risk Detection and Alerting System Based on Electronic Health Records using CogStack

Implementation of a Real-Time Psychosis Risk Detection and Alerting System Based on Electronic Health Records using CogStack

Published on: May 15, 2020

Related Experiment Videos

Last Updated: Jan 13, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Implementation of a Real-Time Psychosis Risk Detection and Alerting System Based on Electronic Health Records using CogStack

Implementation of a Real-Time Psychosis Risk Detection and Alerting System Based on Electronic Health Records using CogStack

Published on: May 15, 2020

LLMs analyzed four critical components: warning signs, internal coping strategies, environmental safety, and reasons for living.

Compared predictive performance across LLMs, optimizing scoring systems, prompts, and parameters.

Main Results:

LLaMA 3 and o3-mini demonstrated superior performance compared to GPT-4 in assessing safety plan quality.
Recommended step-specific scoring systems based on weighted F1-scores for optimal performance.

Conclusions:

Large language models show significant potential for providing clinicians with timely and accurate feedback on safety plan quality.
Automated feedback can enhance the implementation and effectiveness of the Safety Planning Intervention in community mental health practices.