Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Distributions to Estimate Population Parameter01:26

Distributions to Estimate Population Parameter

The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...
Population Growth00:57

Population Growth

Population size is dynamic, increasing with birth rates and immigration, and decreasing with death rates and emigration. In ideal conditions with unlimited resources, populations can increase exponentially, which plots as a J-shaped growth rate curve of population size against time. This type of curve is characteristic of newly-introduced invasive species, or populations that have suffered catastrophic declines and are rebounding.However, realistic environmental conditions limit the number of...
Language Development01:22

Language Development

Children master language quickly and with relative ease, supported by both biological predisposition and reinforcement. B. F. Skinner (1957) proposed that language is learned through reinforcement, while Noam Chomsky (1965) argued that language acquisition mechanisms are biologically determined.
The critical period for language acquisition suggests that the ability to acquire language is at its peak early in life. As people age, this proficiency decreases. Language development begins very...
Synthetic Biology02:55

Synthetic Biology

Synthetic biology is an interdisciplinary science that involves using principles from disciplines such as engineering, molecular biology, cell biology, and systems biology. It involves remodeling existing organisms from nature or constructing completely new synthetic organisms for applications such as protein or enzyme production, bioremediation, value-added macromolecule production, and the addition of desirable traits to crops, to name a few.
Golden rice
Golden rice is a genetically modified...
Sampling Plans01:23

Sampling Plans

Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...
What is Population Genetics?01:25

What is Population Genetics?

A population is composed of members of the same species that simultaneously live and interact in the same area. When individuals in a population breed, they pass down their genes to their offspring. Many of these genes are polymorphic, meaning that they occur in multiple variants. Such variations of a gene are referred to as alleles. The collective set of all the alleles within a population is known as the gene pool.While some alleles of a given gene might be observed commonly, other variants...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Evaluating the Impact of the Federal Tobacco 21 Legislation on Adolescent E-Cigarette Use and Risk Perceptions.

Public health reports (Washington, D.C. : 1974)·2026
Same author

Culturally and linguistically adapting a transdiagnostic LGBTQ-affirming cognitive behavioral skills intervention for Vietnamese gay and bisexual men at risk for HIV: pre-adaptation qualitative interviews.

AIDS care·2025
Same author

Response.

The Journal of adolescent health : official publication of the Society for Adolescent Medicine·2025
Same author

The Impact of Secondhand Vape Exposure on Adolescents' Willingness to Try E-Cigarettes.

The Journal of adolescent health : official publication of the Society for Adolescent Medicine·2025
Same author

Demographic Characteristics Associated With Adolescent Receipt of Provider E-Cigarette Screening and Advice and the Impact on Harm Perception.

AJPM focus·2025
Same author

Treatment of patellar luxation with polyethylene sulcal ridge prostheses in 44 feline stifles: a retrospective study.

Journal of feline medicine and surgery·2024
Same journal

Tracking the narrative: A data-driven analysis of media coverage of Russia and Ukraine 2013-2024.

PloS one·2026
Same journal

Phenotypic and molecular characterization of K64 putative hypervirulent carbapenem-resistant Klebsiella pneumoniae.

PloS one·2026
Same journal

Childhood to Adult Neurodevelopment in Gene-Expanded Huntington's Disease (ChANGE-HD): A prospective longitudinal neurodevelopmental study of Huntington's disease.

PloS one·2026
Same journal

Influence of fruit maturity and variety on seasonal postharvest fungal diseases and vascular streaking in California Avocados.

PloS one·2026
Same journal

Speed guidance strategy at intersections based on platoon recognition in connected and autonomous vehicles environments.

PloS one·2026
Same journal

Outcomes of damage control laparotomy after trauma in low andmiddle-income countries: A systematic review and meta-analysis.

PloS one·2026
See all related articles
  1. Home
  2. A Large Language Model Framework For Sample-free Population Synthesis.
  1. Home
  2. A Large Language Model Framework For Sample-free Population Synthesis.

Related Experiment Video

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

A large language model framework for sample-free population synthesis.

Michael Jones1, Richard Dawson1, Jon Mills1

  • 1School of Engineering, Newcastle University, Newcastle, United Kingdom.

Plos One
|June 2, 2026

View abstract on PubMed

Summary
This summary is machine-generated.

This study introduces a novel framework using large language models (LLMs) to create synthetic populations for agent-based models without needing microdata. The LLM-based approach generates accurate, household-structured populations from aggregate data, enhancing accessibility for data-constrained research.

Related Experiment Videos

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Area of Science:

  • Computational Social Science
  • Demographic Modeling
  • Artificial Intelligence Applications

Background:

  • Agent-based models (ABMs) require realistic synthetic populations for accurate simulations in diverse fields.
  • Existing population synthesis methods often depend on scarce, privacy-restricted, or coarse-scale census microdata.
  • This limitation restricts the application of ABMs in data-scarce environments.

Purpose of the Study:

  • To present a sample-free framework for generating complete, household-structured synthetic populations using large language models (LLMs).
  • To enable the creation of detailed demographic representations directly from aggregate data, overcoming microdata limitations.
  • To expand the utility of ABMs in research settings with constrained data availability.

Main Methods:

  • A multi-step, LLM-agnostic framework involving objective definition, input preparation, LLM selection, and synthetic household generation.
  • Population synthesis is achieved through iterative prompting, where the LLM generates households guided by discrepancies between synthetic and target distributions.
  • The method leverages the LLM's pre-trained knowledge for plausible attribute combinations, ensuring statistical alignment and structural feasibility without model fine-tuning.

Main Results:

  • Global evaluation across 109 countries demonstrated high accuracy in reproducing marginal distributions like gender (SRMSE: 0.003) and household size (SRMSE: 0.026).
  • Complex attributes such as household composition (SRMSE: 0.062) and age (SRMSE: 0.128) were also accurately reproduced.
  • Case studies in Newcastle upon Tyne (UK) and Dar es Salaam (Tanzania) validated the framework's effectiveness.

Conclusions:

  • The developed framework successfully generates coherent, household-structured synthetic populations from aggregate data, eliminating the need for microdata.
  • This approach significantly enhances the applicability of agent-based modeling in diverse research areas facing data limitations.
  • The LLM-based method offers an accessible and low-data-requirement solution for constructing foundational demographic datasets for simulations.