Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

The Influence of Adverse Childhood Experiences on Trauma Informed Care Among Primary Care Providers: A Cross-Sectional Study.

Inquiry : a journal of medical care organization, provision and financing·2025
Same author

Psychometric Evaluation of the Trauma-Informed Care Provider Assessment Tool.

Health services research and managerial epidemiology·2024
Same author

COVID-19 Health Beliefs Regarding Mask Wearing and Vaccinations on Twitter: Deep Learning Approach.

JMIR infodemiology·2022
Same author

Life experience pathways to college student emotional and mental health: A structural equation model.

Journal of American college health : J of ACH·2022
Same author

Family well-being and individual mental health in the early stages of COVID-19.

Families, systems & health : the journal of collaborative family healthcare·2021
Same author

Protection Motivation During COVID-19: A Cross-Sectional Study of Family Health, Media, and Economic Influences.

Health education & behavior : the official publication of the Society for Public Health Education·2021
Same journal

American Medical Association Shares Framework to Address the Escalating Risk of Physician Deepfakes.

Journal of medical Internet research·2026
Same journal

Online Social Interaction, Neighborhood Perception, and the Mediating Role of Social Capital in Charitable Giving for Seriously Ill Patients: Cross-Sectional Study.

Journal of medical Internet research·2026
Same journal

Evaluation of Large Language Models for Structured Data Extraction From Interstitial Lung Disease Clinical Notes: Comparative Study.

Journal of medical Internet research·2026
Same journal

Digital Interventions Targeting Parents to Improve Early Childhood Movement, Nutrition, and Sleep Behaviors: Systematic Review.

Journal of medical Internet research·2026
Same journal

Physical Activity Interventions Using Digital Health Interventions for Cancer-Related Fatigue in People With a History of Cancer: Scoping Review.

Journal of medical Internet research·2026
Same journal

Effectiveness of a Home-Based and Group-Based Tele-Exercise Program for Breast Cancer Survivors: Pilot Randomized Controlled Trial.

Journal of medical Internet research·2026
See all related articles

Related Experiment Video

Updated: Jan 14, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

1.0K

Assessing Large Language Models in Building a Structured Dataset From AskDocs Subreddit Data: Methodological Study.

Quinn Snell1, Chase Westhoff1, John Westhoff2

  • 1Brigham Young University, 3361 TMCB, Provo, UT, 84602, United States, 1 8014225098.

Journal of Medical Internet Research
|October 22, 2025
PubMed
Summary
This summary is machine-generated.

Large language models (LLMs) effectively extract health information from social media, matching human accuracy. This validates LLMs for analyzing digital health communications and online user behavior.

Keywords:
Redditartificial intelligencedata extractionlarge language modelsunstructured text analysis

More Related Videos

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

1.3K
A User-friendly and Powerful R Analysis of Large-scale Datasets
10:56

A User-friendly and Powerful R Analysis of Large-scale Datasets

Published on: November 4, 2025

334

Related Experiment Videos

Last Updated: Jan 14, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

1.0K
Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

1.3K
A User-friendly and Powerful R Analysis of Large-scale Datasets
10:56

A User-friendly and Powerful R Analysis of Large-scale Datasets

Published on: November 4, 2025

334

Area of Science:

  • Digital Health
  • Natural Language Processing
  • Computational Social Science

Background:

  • The subreddit r/AskDocs is a key platform for digital health consultations.
  • Analyzing unstructured user-generated content from forums like r/AskDocs is challenging.
  • Large language models (LLMs) offer advanced tools for extracting health information from social media.

Purpose of the Study:

  • To evaluate the efficacy of LLMs in transforming unstructured r/AskDocs data into a structured format.
  • To compare LLM data extraction performance against human annotators.
  • To assess the alignment of LLM-based data extraction with human cognitive processes.

Main Methods:

  • Data extraction from 2800 r/AskDocs posts using human annotators (medical students) and LLMs.
  • Human annotation included demographics, inquiry type, proxy relationship, chronic conditions, and consultation status.
  • LLM data extraction utilized engineered prompts (JSON, few-shot) with models like Llama 3, Genna, and GPT; Cohen κ assessed inter-annotator reliability.

Main Results:

  • Llama 3 70B (7 few-shot examples) and GPT-4 (2 few-shot examples) achieved the highest accuracy (87.4%) against the human-annotated gold standard.
  • Llama 3 70B demonstrated superior performance in coding health-related content.
  • GPT-4 excelled in extracting demographic information from unstructured posts.

Conclusions:

  • LLMs demonstrate comparable performance to human annotators in extracting demographic and health information from social media health forums.
  • This study validates LLMs as reliable tools for analyzing digital health communications.
  • LLMs show potential for advancing methodologies in digital research by understanding online behaviors and interactions.