Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Choosing Between z and t Distribution

Choosing Between z and t Distribution

The z and the Student t distribution estimate the population mean using the sample mean and standard deviation. However, to decide which distribution to use for a calculation, one needs to determine the sample size, the nature of the distribution, and whether the population standard deviation is known. If the population standard deviation is known and the population is normally distributed, or if the sample size is greater than 30, the z distribution is preferred. The Student t distribution is...

Distributions to Estimate Population Parameter

Distributions to Estimate Population Parameter

The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...

Bias

Bias

Bias refers to any tendency that prevents a question from being considered unprejudiced. In research, bias occurs when one outcome or answer is selected or encouraged over others in sampling or testing. Bias can occur during any research phase, including study design, data collection, analysis, and publication.
In statistics, a sampling bias is created when a sample is collected from a population, and some members of the population are not as likely to be chosen as others (remember, each member...

Bias in Epidemiological Studies

Bias in Epidemiological Studies

Biases can arise at various stages of research, from study design and data collection to analysis and interpretation. Recognizing and addressing these biases is essential to ensure the validity and reliability of epidemiological findings.Broadly speaking, biases in epidemiology fall into three main categories: selection bias, information bias, and confounding. A more detailed description of possible biases is:

What are Estimates?

What are Estimates?

It isn't easy to measure a parameter such as the mean height or the mean weight of a population. So, we draw samples from the population and calculate the mean height or mean weight of the individuals in the sample. This sample data acts as a representative measure of the population parameter. These sample statistics are known as estimates.
The estimate for the mean of a sample is denoted by ͞x, whereas the mean of the population is designated as μ. Further, parameters such...

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Chimpanzees spontaneously prepare for mutually exclusive possibilities, and collective context strengthens this behaviour.

Philosophical transactions of the Royal Society of London. Series B, Biological sciences·2026

Same author

Generative AI for climate governance and acceptability-constrained policy design.

npj climate action·2026

Same author

The private solution trap in collective action problems across 34 nations.

Proceedings of the National Academy of Sciences of the United States of America·2026

Same author

Should I stay or should I go with them?

Science (New York, N.Y.)·2026

Same author

Social identity and cooperation co-evolve in a multilevel public goods game.

Scientific reports·2025

Same author

Switching, fast and slow: Deciphering the dynamics of memory search, its brain connectivity patterns, and its role in creativity.

Imaging neuroscience (Cambridge, Mass.)·2025

Same journal

Poly(bromophenol blue)/CoSn(OH)<sub>6</sub> cubic particles modified pencil graphite electrode for electrochemical determination of diphenhydramine.

Scientific reports·2026

Same journal

Dietary Chlorella, Spirulina, and acidifier modulate jejunal cytokine-related gene expression in broiler chickens.

Scientific reports·2026

Same journal

Perceived physical activity barriers in university students: associations with fatigue and eating behaviours.

Scientific reports·2026

Same journal

Refuge limitation structures habitat use in agricultural landscapes: evidence from Sunda pangolins.

Scientific reports·2026

Same journal

Lightweight stateless transaction verification with outsourced witness updates for UTXO blockchains.

Scientific reports·2026

Same journal

Efficacy of historical context and exogenous features on deep learning for cooling load forecasting in chilled water plants.

Scientific reports·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Oct 22, 2025

Modeling the Size Spectrum for Macroinvertebrates and Fishes in Stream Ecosystems

Modeling the Size Spectrum for Macroinvertebrates and Fishes in Stream Ecosystems

Published on: July 30, 2019

Bias in Zipf's law estimators.

Charlie Pilgrim¹, Thomas T Hills^2,3

¹Mathematics for Real-World Systems Centre for Doctoral Training, The University of Warwick, Coventry, CV4 7AL, UK. charlie.pilgrim@warwick.ac.uk.

Scientific Reports

|August 28, 2021

Summary

This summary is machine-generated.

Maximum likelihood estimators for power law models are biased due to an incorrect likelihood function. Even approximate Bayesian computation (ABC) methods show bias when applied to natural language rank-frequency data.

More Related Videos

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Using the Race Model Inequality to Quantify Behavioral Multisensory Integration Effects

Using the Race Model Inequality to Quantify Behavioral Multisensory Integration Effects

Published on: May 10, 2019

Related Experiment Videos

Last Updated: Oct 22, 2025

Modeling the Size Spectrum for Macroinvertebrates and Fishes in Stream Ecosystems

Modeling the Size Spectrum for Macroinvertebrates and Fishes in Stream Ecosystems

Published on: July 30, 2019

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Using the Race Model Inequality to Quantify Behavioral Multisensory Integration Effects

Using the Race Model Inequality to Quantify Behavioral Multisensory Integration Effects

Published on: May 10, 2019

Area of Science:

Quantitative Linguistics
Statistical Modeling
Computational Linguistics

Background:

Rank-frequency distributions, often modeled by power laws (e.g., Zipf's Law), are common in natural language.
Existing maximum likelihood estimators (MLEs) for inferring power law exponents from such data are known to be biased.
The bias in MLEs stems from the use of an inappropriate likelihood function.

Purpose of the Study:

To derive the correct likelihood function for power law inference from rank-frequency data.
To investigate the bias of existing and novel estimators, including approximate Bayesian computation (ABC), for power law models.
To assess the impact of assuming simple probability distributions versus complex natural language processes on bias.

Main Methods:

Derivation of the theoretically correct likelihood function for power law models.
Implementation and evaluation of an approximate Bayesian computation (ABC) method.
Analysis of bias in estimators when applied to both idealized Zipfian distributions and natural language data.

Main Results:

The correct likelihood function is computationally intractable.
The proposed ABC method exhibits reduced bias compared to traditional MLEs for idealized data.
Both traditional MLEs and the investigated ABC method retain significant bias when applied to natural language due to the assumption of simple probability distributions.

Conclusions:

Researchers must be aware of inherent biases when using current methods to infer power laws from rank-frequency data, especially in natural language.
The assumption of simple probability distributions is a critical limitation for accurate Zipf exponent estimation in linguistics.
Further development of methods accounting for the complexity of natural language processes is needed for unbiased power law inference.