Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

The Influence of Adverse Childhood Experiences on Trauma Informed Care Among Primary Care Providers: A Cross-Sectional Study.

Inquiry : a journal of medical care organization, provision and financing·2025

Same author

Psychometric Evaluation of the Trauma-Informed Care Provider Assessment Tool.

Health services research and managerial epidemiology·2024

Same author

COVID-19 Health Beliefs Regarding Mask Wearing and Vaccinations on Twitter: Deep Learning Approach.

JMIR infodemiology·2022

Same author

Life experience pathways to college student emotional and mental health: A structural equation model.

Journal of American college health : J of ACH·2022

Same author

Family well-being and individual mental health in the early stages of COVID-19.

Families, systems & health : the journal of collaborative family healthcare·2021

Same author

Protection Motivation During COVID-19: A Cross-Sectional Study of Family Health, Media, and Economic Influences.

Health education & behavior : the official publication of the Society for Public Health Education·2021

Same journal

How Does That Large Language Model Make You Feel?

Journal of medical Internet research·2026

Same journal

Transformation Versus Innovation in Digital Health Care and the Future of Clinical AI.

Journal of medical Internet research·2026

Same journal

Building a Malaria Intelligence System for Real-Time Prediction and Data-Driven Intervention Planning.

Journal of medical Internet research·2026

Same journal

Therapeutic Interaction Features of AI Chatbots in Depression Interventions: Systematic Review and Meta-Analysis.

Journal of medical Internet research·2026

Same journal

Large Language Model Versus Multidisciplinary Team: Feasibility Study of Pancreatic Cancer Management Recommendations.

Journal of medical Internet research·2026

Same journal

Centers for Medicare & Medicaid Services to Launch Landmark ACCESS Program.

Journal of medical Internet research·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 14, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Assessing Large Language Models in Building a Structured Dataset From AskDocs Subreddit Data: Methodological Study.

Quinn Snell¹, Chase Westhoff¹, John Westhoff²

¹Brigham Young University, 3361 TMCB, Provo, UT, 84602, United States, 1 8014225098.

Journal of Medical Internet Research

|October 22, 2025

Summary

This summary is machine-generated.

Large language models (LLMs) effectively extract health information from social media, matching human accuracy. This validates LLMs for analyzing digital health communications and online user behavior.

Keywords:

Reddit artificial intelligence data extraction large language models unstructured text analysis

More Related Videos

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

A User-friendly and Powerful R Analysis of Large-scale Datasets

A User-friendly and Powerful R Analysis of Large-scale Datasets

Published on: November 4, 2025

Related Experiment Videos

Last Updated: Jan 14, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

A User-friendly and Powerful R Analysis of Large-scale Datasets

A User-friendly and Powerful R Analysis of Large-scale Datasets

Published on: November 4, 2025

Area of Science:

Digital Health
Natural Language Processing
Computational Social Science

Background:

The subreddit r/AskDocs is a key platform for digital health consultations.
Analyzing unstructured user-generated content from forums like r/AskDocs is challenging.
Large language models (LLMs) offer advanced tools for extracting health information from social media.

Purpose of the Study:

To evaluate the efficacy of LLMs in transforming unstructured r/AskDocs data into a structured format.
To compare LLM data extraction performance against human annotators.
To assess the alignment of LLM-based data extraction with human cognitive processes.

Main Methods:

Data extraction from 2800 r/AskDocs posts using human annotators (medical students) and LLMs.
Human annotation included demographics, inquiry type, proxy relationship, chronic conditions, and consultation status.
LLM data extraction utilized engineered prompts (JSON, few-shot) with models like Llama 3, Genna, and GPT; Cohen κ assessed inter-annotator reliability.

Main Results:

Llama 3 70B (7 few-shot examples) and GPT-4 (2 few-shot examples) achieved the highest accuracy (87.4%) against the human-annotated gold standard.
Llama 3 70B demonstrated superior performance in coding health-related content.
GPT-4 excelled in extracting demographic information from unstructured posts.

Conclusions:

LLMs demonstrate comparable performance to human annotators in extracting demographic and health information from social media health forums.
This study validates LLMs as reliable tools for analyzing digital health communications.
LLMs show potential for advancing methodologies in digital research by understanding online behaviors and interactions.