Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

The Influence of Adverse Childhood Experiences on Trauma Informed Care Among Primary Care Providers: A Cross-Sectional Study.

Inquiry : a journal of medical care organization, provision and financing·2025

Same author

Psychometric Evaluation of the Trauma-Informed Care Provider Assessment Tool.

Health services research and managerial epidemiology·2024

Same author

COVID-19 Health Beliefs Regarding Mask Wearing and Vaccinations on Twitter: Deep Learning Approach.

JMIR infodemiology·2022

Same author

Life experience pathways to college student emotional and mental health: A structural equation model.

Journal of American college health : J of ACH·2022

Same author

Family well-being and individual mental health in the early stages of COVID-19.

Families, systems & health : the journal of collaborative family healthcare·2021

Same author

Protection Motivation During COVID-19: A Cross-Sectional Study of Family Health, Media, and Economic Influences.

Health education & behavior : the official publication of the Society for Public Health Education·2021

Same journal

American Medical Association Shares Framework to Address the Escalating Risk of Physician Deepfakes.

Journal of medical Internet research·2026

Same journal

Online Social Interaction, Neighborhood Perception, and the Mediating Role of Social Capital in Charitable Giving for Seriously Ill Patients: Cross-Sectional Study.

Journal of medical Internet research·2026

Same journal

Evaluation of Large Language Models for Structured Data Extraction From Interstitial Lung Disease Clinical Notes: Comparative Study.

Journal of medical Internet research·2026

Same journal

Digital Interventions Targeting Parents to Improve Early Childhood Movement, Nutrition, and Sleep Behaviors: Systematic Review.

Journal of medical Internet research·2026

Same journal

Physical Activity Interventions Using Digital Health Interventions for Cancer-Related Fatigue in People With a History of Cancer: Scoping Review.

Journal of medical Internet research·2026

Same journal

Effectiveness of a Home-Based and Group-Based Tele-Exercise Program for Breast Cancer Survivors: Pilot Randomized Controlled Trial.

Journal of medical Internet research·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 14, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Assessing Large Language Models in Building a Structured Dataset From AskDocs Subreddit Data: Methodological Study.

Quinn Snell¹, Chase Westhoff¹, John Westhoff²

¹Brigham Young University, 3361 TMCB, Provo, UT, 84602, United States, 1 8014225098.

Journal of Medical Internet Research

|October 22, 2025

Summary

This summary is machine-generated.

Large language models (LLMs) effectively extract health information from social media, matching human accuracy. This validates LLMs for analyzing digital health communications and online user behavior.

Keywords:

Reddit artificial intelligence data extraction large language models unstructured text analysis

More Related Videos

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

A User-friendly and Powerful R Analysis of Large-scale Datasets

A User-friendly and Powerful R Analysis of Large-scale Datasets

Published on: November 4, 2025

Related Experiment Videos

Last Updated: Jan 14, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

A User-friendly and Powerful R Analysis of Large-scale Datasets

A User-friendly and Powerful R Analysis of Large-scale Datasets

Published on: November 4, 2025

Area of Science:

Digital Health
Natural Language Processing
Computational Social Science

Background:

The subreddit r/AskDocs is a key platform for digital health consultations.
Analyzing unstructured user-generated content from forums like r/AskDocs is challenging.
Large language models (LLMs) offer advanced tools for extracting health information from social media.

Purpose of the Study:

To evaluate the efficacy of LLMs in transforming unstructured r/AskDocs data into a structured format.
To compare LLM data extraction performance against human annotators.
To assess the alignment of LLM-based data extraction with human cognitive processes.

Main Methods:

Data extraction from 2800 r/AskDocs posts using human annotators (medical students) and LLMs.
Human annotation included demographics, inquiry type, proxy relationship, chronic conditions, and consultation status.
LLM data extraction utilized engineered prompts (JSON, few-shot) with models like Llama 3, Genna, and GPT; Cohen κ assessed inter-annotator reliability.

Main Results:

Llama 3 70B (7 few-shot examples) and GPT-4 (2 few-shot examples) achieved the highest accuracy (87.4%) against the human-annotated gold standard.
Llama 3 70B demonstrated superior performance in coding health-related content.
GPT-4 excelled in extracting demographic information from unstructured posts.

Conclusions:

LLMs demonstrate comparable performance to human annotators in extracting demographic and health information from social media health forums.
This study validates LLMs as reliable tools for analyzing digital health communications.
LLMs show potential for advancing methodologies in digital research by understanding online behaviors and interactions.