Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Measures of Central Tendency

Measures of Central Tendency

The "center" of a data set is also a way of describing location. The two most widely used measures of the "center" of the data are the mean (average) and the median. The words "mean" and "average" are often used interchangeably. The substitution of one word for the other is common practice. The technical term is "arithmetic mean" and "average" is technically a center location. However, in practice among non-statisticians,...

Trimmed Mean

Trimmed Mean

While measuring the mean of a data set, care needs to be taken when associating the mean to its central tendency. The same goes for the arithmetic mean, the geometric mean, or the harmonic mean. This is because the presence of a single outlier data value can significantly affect the mean. That is, the mean is sensitive to fluctuations in the data set.
Although certain measures of central tendency are not sensitive to outliers, there are alternative versions of the mean that get around the...

Skewness

Skewness

The measures of central tendency calculated from a data set may not reveal much about its intrinsic distribution. If a plot is made of the data set’s values, the mean and the median may not only differ, but also the plot may have more values on one side of the central tendencies. Such a data set is said to be skewed towards that side.
The longer the tail of the plot on one side, the more skewed it is. The skewness of a data set’s values suggests that the measures of central tendency...

Distributions to Estimate Population Parameter

Distributions to Estimate Population Parameter

The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...

Weighted Mean

Weighted Mean

While taking the arithmetic, geometric, or harmonic mean of a sample data set, equal importance is assigned to all the data points. However, all the values may not always be equally important in some data sets. An intrinsic bias might make it more important to give more weightage to specific values over others.
For example, consider the number of goals scored in the matches of a tournament. While computing the average number of goals scored in the tournament, it may be more important to...

Sampling Distribution

Sampling Distribution

Given simple random samples of size n from a given population with a measured characteristic such as mean, proportion, or standard deviation for each sample, the probability distribution of all the measured characteristics is called a sampling distribution. How much the statistic varies from one sample to another is known as the sampling variability of a statistic. You typically measure the sampling variability of a statistic by its standard error. The standard error of the mean is an example...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Lessons from genome-wide association studies for epidemiology.

Epidemiology (Cambridge, Mass.)·2012

Same author

Circadian genes and breast cancer susceptibility in rotating shift workers.

International journal of cancer·2012

Same author

Recurrence of radicular pain or back pain after nonsurgical treatment of symptomatic lumbar disk herniation.

Archives of physical medicine and rehabilitation·2012

Same author

Interactions between genome-wide significant genetic variants and circulating concentrations of insulin-like growth factor 1, sex hormones, and binding proteins in relation to prostate cancer risk in the National Cancer Institute Breast and Prostate Cancer Cohort Consortium.

American journal of epidemiology·2012

Same author

Common breast cancer susceptibility variants in LSP1 and RAD51L1 are associated with mammographic density measures that predict breast cancer risk.

Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology·2012

Same author

Detection of osteophytes and subchondral cysts in the knee with use of tomosynthesis.

Radiology·2012

Same journal

Signs of the End of the Paradox? Cohort Shifts in Smoking and Obesity and the Hispanic Life Expectancy Advantage.

Sociological science·2026

Same journal

The causal impact of segregation on a disparity: A gap-closing approach.

Sociological science·2026

Same journal

Demographic Differences in Responses to a Two-Step Gender Identity Measure.

Sociological science·2026

Same journal

Labor Market Consequences of Grandparenthood.

Sociological science·2025

Same journal

Can't Catch a Break: Intersectional Inequalities at Work.

Sociological science·2024

Same journal

Socioeconomic, Ethnic, Racial, and Gender Gaps in Children's Social/Behavioral Skills: Do They Grow Faster in School or out?

Sociological science·2024

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 4, 2025

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

Published on: December 10, 2012

Better Estimates from Binned Income Data: Interpolated CDFs and Mean-Matching.

Paul T von Hippel¹, David J Hunter², McKalie Drown²

¹University of Texas at Austin.

Sociological Science

|February 7, 2024

Summary

This summary is machine-generated.

This study introduces a faster, more accurate method for estimating income distributions from binned data using interpolated cumulative distribution functions (CDFs). Constraining these estimates to a known mean significantly improves accuracy for income statistics like Gini coefficients.

Keywords:

Gini grouped data income brackets inequality

More Related Videos

A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data

A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data

Published on: December 9, 2015

Measuring Delay Discounting in Humans Using an Adjusting Amount Task

Measuring Delay Discounting in Humans Using an Adjusting Amount Task

Published on: January 9, 2016

Related Experiment Videos

Last Updated: Jul 4, 2025

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

Published on: December 10, 2012

A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data

A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data

Published on: December 9, 2015

Measuring Delay Discounting in Humans Using an Adjusting Amount Task

Measuring Delay Discounting in Humans Using an Adjusting Amount Task

Published on: January 9, 2016

Area of Science:

Economics
Statistics
Data Science

Background:

Estimating income statistics from binned data is common.
Existing methods like bin midpoints or parametric distributions have limitations in accuracy and speed.
Accurate income distribution estimation is crucial for socioeconomic analysis.

Purpose of the Study:

To develop and evaluate improved methods for estimating income statistics from binned data.
To compare the performance of nonparametric interpolated cumulative distribution functions (CDFs) against traditional methods.
To assess the impact of constraining estimates to a known mean on accuracy.

Main Methods:

Fitting nonparametric continuous distributions by interpolating the cumulative distribution function (CDF) to match bin counts.
Constraining both interpolated CDFs and bin midpoints to reproduce a known mean income.
Evaluating Gini coefficient estimation accuracy across 3,221 U.S. counties.

Main Results:

Interpolated CDFs accurately reproduce bin counts and are faster than parametric methods.
Constraining estimates to a known mean dramatically improves accuracy for both interpolated CDFs and midpoints.
Interpolated CDFs offer a slight accuracy improvement over constrained midpoints.

Conclusions:

Nonparametric interpolated CDFs provide a superior method for estimating income distributions from binned data.
Matching estimates to a known mean is a critical step for enhancing the reliability of income statistics.
Software packages 'binsmooth' (R) and 'rpme' (Stata) are available for implementing these methods.