Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Choosing Between z and t Distribution01:25

Choosing Between z and t Distribution

3.3K
The z and the Student t distribution estimate the population mean using the sample mean and standard deviation. However, to decide which distribution to use for a calculation, one needs to determine the sample size, the nature of the distribution, and whether the population standard deviation is known. If the population standard deviation is known and the population is normally distributed, or if the sample size is greater than 30, the z distribution is preferred. The Student t distribution is...
3.3K
Distributions to Estimate Population Parameter01:26

Distributions to Estimate Population Parameter

4.4K
The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...
4.4K
Bias01:22

Bias

6.4K
Bias refers to any tendency that prevents a question from being considered unprejudiced. In research, bias occurs when one outcome or answer is selected or encouraged over others in sampling or testing. Bias can occur during any research phase, including study design, data collection, analysis, and publication.
In statistics, a sampling bias is created when a sample is collected from a population, and some members of the population are not as likely to be chosen as others (remember, each member...
6.4K
Bias in Epidemiological Studies01:29

Bias in Epidemiological Studies

833
Biases can arise at various stages of research, from study design and data collection to analysis and interpretation. Recognizing and addressing these biases is essential to ensure the validity and reliability of epidemiological findings.Broadly speaking, biases in epidemiology fall into three main categories: selection bias, information bias, and confounding. A more detailed description of possible biases is:  
833
What are Estimates?01:06

What are Estimates?

6.5K
It isn't easy to measure a parameter such as the mean height or the mean weight of a population. So, we draw samples from the population and calculate the mean height or mean weight of the individuals in the sample. This sample data acts as a representative measure of the population parameter. These sample statistics are known as estimates. 
The estimate for the mean of a sample is denoted by ͞x, whereas the mean of the population is designated as μ. Further, parameters such...
6.5K
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

4.2K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
4.2K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Chimpanzees spontaneously prepare for mutually exclusive possibilities, and collective context strengthens this behaviour.

Philosophical transactions of the Royal Society of London. Series B, Biological sciences·2026
Same author

Generative AI for climate governance and acceptability-constrained policy design.

npj climate action·2026
Same author

The private solution trap in collective action problems across 34 nations.

Proceedings of the National Academy of Sciences of the United States of America·2026
Same author

Should I stay or should I go with them?

Science (New York, N.Y.)·2026
Same author

Social identity and cooperation co-evolve in a multilevel public goods game.

Scientific reports·2025
Same author

Switching, fast and slow: Deciphering the dynamics of memory search, its brain connectivity patterns, and its role in creativity.

Imaging neuroscience (Cambridge, Mass.)·2025
Same journal

Poly(bromophenol blue)/CoSn(OH)<sub>6</sub> cubic particles modified pencil graphite electrode for electrochemical determination of diphenhydramine.

Scientific reports·2026
Same journal

Dietary Chlorella, Spirulina, and acidifier modulate jejunal cytokine-related gene expression in broiler chickens.

Scientific reports·2026
Same journal

Perceived physical activity barriers in university students: associations with fatigue and eating behaviours.

Scientific reports·2026
Same journal

Refuge limitation structures habitat use in agricultural landscapes: evidence from Sunda pangolins.

Scientific reports·2026
Same journal

Lightweight stateless transaction verification with outsourced witness updates for UTXO blockchains.

Scientific reports·2026
Same journal

Efficacy of historical context and exogenous features on deep learning for cooling load forecasting in chilled water plants.

Scientific reports·2026
See all related articles

Related Experiment Video

Updated: Oct 22, 2025

Modeling the Size Spectrum for Macroinvertebrates and Fishes in Stream Ecosystems
07:41

Modeling the Size Spectrum for Macroinvertebrates and Fishes in Stream Ecosystems

Published on: July 30, 2019

7.7K

Bias in Zipf's law estimators.

Charlie Pilgrim1, Thomas T Hills2,3

  • 1Mathematics for Real-World Systems Centre for Doctoral Training, The University of Warwick, Coventry, CV4 7AL, UK. charlie.pilgrim@warwick.ac.uk.

Scientific Reports
|August 28, 2021
PubMed
Summary
This summary is machine-generated.

Maximum likelihood estimators for power law models are biased due to an incorrect likelihood function. Even approximate Bayesian computation (ABC) methods show bias when applied to natural language rank-frequency data.

More Related Videos

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.5K
Using the Race Model Inequality to Quantify Behavioral Multisensory Integration Effects
08:13

Using the Race Model Inequality to Quantify Behavioral Multisensory Integration Effects

Published on: May 10, 2019

6.5K

Related Experiment Videos

Last Updated: Oct 22, 2025

Modeling the Size Spectrum for Macroinvertebrates and Fishes in Stream Ecosystems
07:41

Modeling the Size Spectrum for Macroinvertebrates and Fishes in Stream Ecosystems

Published on: July 30, 2019

7.7K
Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.5K
Using the Race Model Inequality to Quantify Behavioral Multisensory Integration Effects
08:13

Using the Race Model Inequality to Quantify Behavioral Multisensory Integration Effects

Published on: May 10, 2019

6.5K

Area of Science:

  • Quantitative Linguistics
  • Statistical Modeling
  • Computational Linguistics

Background:

  • Rank-frequency distributions, often modeled by power laws (e.g., Zipf's Law), are common in natural language.
  • Existing maximum likelihood estimators (MLEs) for inferring power law exponents from such data are known to be biased.
  • The bias in MLEs stems from the use of an inappropriate likelihood function.

Purpose of the Study:

  • To derive the correct likelihood function for power law inference from rank-frequency data.
  • To investigate the bias of existing and novel estimators, including approximate Bayesian computation (ABC), for power law models.
  • To assess the impact of assuming simple probability distributions versus complex natural language processes on bias.

Main Methods:

  • Derivation of the theoretically correct likelihood function for power law models.
  • Implementation and evaluation of an approximate Bayesian computation (ABC) method.
  • Analysis of bias in estimators when applied to both idealized Zipfian distributions and natural language data.

Main Results:

  • The correct likelihood function is computationally intractable.
  • The proposed ABC method exhibits reduced bias compared to traditional MLEs for idealized data.
  • Both traditional MLEs and the investigated ABC method retain significant bias when applied to natural language due to the assumption of simple probability distributions.

Conclusions:

  • Researchers must be aware of inherent biases when using current methods to infer power laws from rank-frequency data, especially in natural language.
  • The assumption of simple probability distributions is a critical limitation for accurate Zipf exponent estimation in linguistics.
  • Further development of methods accounting for the complexity of natural language processes is needed for unbiased power law inference.