Jove
Visualize
Contact Us

Related Concept Videos

Sample Size Calculation01:19

Sample Size Calculation

3.3K
Knowledge of the sample size is the first requirement to conduct random sampling or an experiment. The sample size is the total number of units, observations, or groups (in some cases) used to get the data to estimate a population parameter. As the name suggests, the sample size is that of the sample drawn from the population and differs from the population size.
The sample size for the given experiment or sampling effort is fundamental to any study design. Sample size decides the number of...
3.3K
Contaminants and Errors01:16

Contaminants and Errors

88
Effective sample preparation is crucial for accurate and reliable laboratory analysis. During this process, two significant sources of error can arise: concentration bias from improper sample splitting and contamination caused by methods used to reduce particle size, such as grinding or homogenization. Identifying and minimizing these potential errors is crucial to ensuring the validity of the analysis.
Another key consideration is determining the appropriate number of samples required to...
88
Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving01:29

Mechanistic Models: Compartment Models in Algorithms for Numerical Problem Solving

48
Mechanistic models play a crucial role in algorithms for numerical problem-solving, particularly in nonlinear mixed effects modeling (NMEM). These models aim to minimize specific objective functions by evaluating various parameter estimates, leading to the development of systematic algorithms. In some cases, linearization techniques approximate the model using linear equations.
In individual population analyses, different algorithms are employed, such as Cauchy's method, which uses a...
48

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Orchestrator multi-agent clinical decision support system for secondary headache diagnosis in primary care.

Journal of the American Medical Informatics Association : JAMIA·2026
Same author

Multimodal passive smartphone sensing in older adults: a guide for clinical scientists based upon an ongoing cohort study.

Innovation in aging·2026
Same author

CPGPrompt: translating clinical guidelines into large language model-executable decision support.

Journal of the American Medical Informatics Association : JAMIA·2026
Same author

Noninferiority and Efficiency/Revenue Facilitation (NERF) Endpoints : Shifting Grounds of Argument in Health AI Interventional Studies.

Journal of bioethical inquiry·2025
Same author

A foundation model for human-AI collaboration in medical literature mining.

Nature communications·2025
Same author

Accuracy of Large Language Models to Identify Stroke Subtypes Within Unstructured Electronic Health Record Data.

Stroke·2025
Same journal

Supporting Radiology Resident Education and Clinical Decision-Making With Large Language Models: Comparative Study of Reasoning Models DeepSeek-R1 and ChatGPT-o1.

JMIR AI·2026
Same journal

Patient Perceptions on the Use of Artificial Intelligence in Creating Clinical Research Documents: Survey Study.

JMIR AI·2026
Same journal

Application of Language Models for the Analysis of Adverse Drug Events in Pharmaceutical Research and Development: Scoping Review.

JMIR AI·2026
Same journal

Correction: Deep Learning for Age Estimation and Sex Prediction Using Mandibular-Cropped Cephalometric Images: Comparative Model Development and Validation Study.

JMIR AI·2026
Same journal

AI-Assisted Systematic Literature Review of the Economic Burden of Pneumococcal Disease: Development and Validation Study.

JMIR AI·2026
Same journal

Knowledge-Augmented Large Language Model for Multimodal Electronic Health Record-Based Risk Prediction: Development and Validation Study.

JMIR AI·2026
See all related articles
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Video

Updated: Jun 23, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

543

Sample Size Considerations for Fine-Tuning Large Language Models for Named Entity Recognition Tasks: Methodological

Zoltan P Majdik1, S Scott Graham2, Jade C Shiva Edward2

  • 1Department of Communication, North Dakota State University, Fargo, ND, United States.

JMIR AI
|June 14, 2024
PubMed
Summary
This summary is machine-generated.

Modest sample sizes effectively fine-tune large language models (LLMs) for biomedical named entity recognition (NER). Training data density is key, and quality may outweigh volume for optimal performance.

Keywords:
annotationconflict of interestdisclosuredisclosuresexpert annotationfine-tuninglanguage modellarge language modelsmachine learningnamed-entity recognitionnatural language processingsamplesample sizestatementstatementstransfer learning

More Related Videos

Comparing the Frequency Effect Between the Lexical Decision and Naming Tasks in Chinese
08:08

Comparing the Frequency Effect Between the Lexical Decision and Naming Tasks in Chinese

Published on: April 1, 2016

9.3K
Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment
06:48

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

9.2K

Related Experiment Videos

Last Updated: Jun 23, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

543
Comparing the Frequency Effect Between the Lexical Decision and Naming Tasks in Chinese
08:08

Comparing the Frequency Effect Between the Lexical Decision and Naming Tasks in Chinese

Published on: April 1, 2016

9.3K
Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment
06:48

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

9.2K

Area of Science:

  • Health Informatics
  • Natural Language Processing
  • Biomedical Data Science

Background:

  • Large language models (LLMs) offer significant potential for health informatics applications.
  • However, there is a lack of practical data regarding sample size requirements for fine-tuning LLMs in biomedical and health policy contexts.

Purpose of the Study:

  • To evaluate sample size and selection techniques for fine-tuning LLMs.
  • To improve named entity recognition (NER) for conflict of interest disclosure statements.

Main Methods:

  • Annotated 490 conflict of interest disclosure statements to identify "PERSON" and "ORG" entities.
  • Drew 2500 stratified random samples of varying sizes for fine-tuning.
  • Trained Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-trained Transformer (GPT) models using these samples.
  • Assessed the impact of sample size (sentences) and entity density (entities per sentence [EPS]) on NER performance (F1-score) using multiple regression.

Main Results:

  • Fine-tuned models achieved high NER performance (F1-score: 0.79–0.96).
  • Both sample size and EPS were significant predictors of model performance (P<.001).
  • Identified diminishing marginal returns for both sample size (439–527 sentences) and EPS (1.36–1.38).

Conclusions:

  • Effective fine-tuning of LLMs for biomedical NER is achievable with modest sample sizes.
  • Training data entity density should align with production data.
  • Training data quality and model architecture's intended use are critical factors, potentially more so than data volume or model size.