Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Measures of Central Tendency02:16

Measures of Central Tendency

16.0K
The "center" of a data set is also a way of describing location. The two most widely used measures of the "center" of the data are the mean (average) and the median. The words "mean" and "average" are often used interchangeably. The substitution of one word for the other is common practice. The technical term is "arithmetic mean" and "average" is technically a center location. However, in practice among non-statisticians,...
16.0K
Trimmed Mean01:10

Trimmed Mean

2.9K
While measuring the mean of a data set, care needs to be taken when associating the mean to its central tendency. The same goes for the arithmetic mean, the geometric mean, or the harmonic mean. This is because the presence of a single outlier data value can significantly affect the mean. That is, the mean is sensitive to fluctuations in the data set.
Although certain measures of central tendency are not sensitive to outliers, there are alternative versions of the mean that get around the...
2.9K
Skewness01:06

Skewness

11.1K
The measures of central tendency calculated from a data set may not reveal much about its intrinsic distribution. If a plot is made of the data set’s values, the mean and the median may not only differ, but also the plot may have more values on one side of the central tendencies. Such a data set is said to be skewed towards that side.
The longer the tail of the plot on one side, the more skewed it is. The skewness of a data set’s values suggests that the measures of central tendency...
11.1K
Distributions to Estimate Population Parameter01:26

Distributions to Estimate Population Parameter

4.1K
The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...
4.1K
Weighted Mean00:57

Weighted Mean

5.2K
While taking the arithmetic, geometric, or harmonic mean of a sample data set, equal importance is assigned to all the data points. However, all the values may not always be equally important in some data sets. An intrinsic bias might make it more important to give more weightage to specific values over others.
For example, consider the number of goals scored in the matches of a tournament. While computing the average number of goals scored in the tournament, it may be more important to...
5.2K
Sampling Distribution01:12

Sampling Distribution

12.6K
Given simple random samples of size n from a given population with a measured characteristic such as mean, proportion, or standard deviation for each sample, the probability distribution of all the measured characteristics is called a sampling distribution. How much the statistic varies from one sample to another is known as the sampling variability of a statistic. You typically measure the sampling variability of a statistic by its standard error. The standard error of the mean is an example...
12.6K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Lessons from genome-wide association studies for epidemiology.

Epidemiology (Cambridge, Mass.)·2012
Same author

Circadian genes and breast cancer susceptibility in rotating shift workers.

International journal of cancer·2012
Same author

Recurrence of radicular pain or back pain after nonsurgical treatment of symptomatic lumbar disk herniation.

Archives of physical medicine and rehabilitation·2012
Same author

Interactions between genome-wide significant genetic variants and circulating concentrations of insulin-like growth factor 1, sex hormones, and binding proteins in relation to prostate cancer risk in the National Cancer Institute Breast and Prostate Cancer Cohort Consortium.

American journal of epidemiology·2012
Same author

Common breast cancer susceptibility variants in LSP1 and RAD51L1 are associated with mammographic density measures that predict breast cancer risk.

Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology·2012
Same author

Detection of osteophytes and subchondral cysts in the knee with use of tomosynthesis.

Radiology·2012
Same journal

Signs of the End of the Paradox? Cohort Shifts in Smoking and Obesity and the Hispanic Life Expectancy Advantage.

Sociological science·2026
Same journal

The causal impact of segregation on a disparity: A gap-closing approach.

Sociological science·2026
Same journal

Demographic Differences in Responses to a Two-Step Gender Identity Measure.

Sociological science·2026
Same journal

Labor Market Consequences of Grandparenthood.

Sociological science·2025
Same journal

Can't Catch a Break: Intersectional Inequalities at Work.

Sociological science·2024
Same journal

Socioeconomic, Ethnic, Racial, and Gender Gaps in Children's Social/Behavioral Skills: Do They Grow Faster in School or out?

Sociological science·2024
See all related articles

Related Experiment Video

Updated: Jul 4, 2025

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types
12:39

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

Published on: December 10, 2012

11.4K

Better Estimates from Binned Income Data: Interpolated CDFs and Mean-Matching.

Paul T von Hippel1, David J Hunter2, McKalie Drown2

  • 1University of Texas at Austin.

Sociological Science
|February 7, 2024
PubMed
Summary
This summary is machine-generated.

This study introduces a faster, more accurate method for estimating income distributions from binned data using interpolated cumulative distribution functions (CDFs). Constraining these estimates to a known mean significantly improves accuracy for income statistics like Gini coefficients.

Keywords:
Ginigrouped dataincome bracketsinequality

More Related Videos

A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data
10:46

A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data

Published on: December 9, 2015

10.7K
Measuring Delay Discounting in Humans Using an Adjusting Amount Task
07:47

Measuring Delay Discounting in Humans Using an Adjusting Amount Task

Published on: January 9, 2016

15.4K

Related Experiment Videos

Last Updated: Jul 4, 2025

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types
12:39

A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

Published on: December 10, 2012

11.4K
A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data
10:46

A Method of Trigonometric Modelling of Seasonal Variation Demonstrated with Multiple Sclerosis Relapse Data

Published on: December 9, 2015

10.7K
Measuring Delay Discounting in Humans Using an Adjusting Amount Task
07:47

Measuring Delay Discounting in Humans Using an Adjusting Amount Task

Published on: January 9, 2016

15.4K

Area of Science:

  • Economics
  • Statistics
  • Data Science

Background:

  • Estimating income statistics from binned data is common.
  • Existing methods like bin midpoints or parametric distributions have limitations in accuracy and speed.
  • Accurate income distribution estimation is crucial for socioeconomic analysis.

Purpose of the Study:

  • To develop and evaluate improved methods for estimating income statistics from binned data.
  • To compare the performance of nonparametric interpolated cumulative distribution functions (CDFs) against traditional methods.
  • To assess the impact of constraining estimates to a known mean on accuracy.

Main Methods:

  • Fitting nonparametric continuous distributions by interpolating the cumulative distribution function (CDF) to match bin counts.
  • Constraining both interpolated CDFs and bin midpoints to reproduce a known mean income.
  • Evaluating Gini coefficient estimation accuracy across 3,221 U.S. counties.

Main Results:

  • Interpolated CDFs accurately reproduce bin counts and are faster than parametric methods.
  • Constraining estimates to a known mean dramatically improves accuracy for both interpolated CDFs and midpoints.
  • Interpolated CDFs offer a slight accuracy improvement over constrained midpoints.

Conclusions:

  • Nonparametric interpolated CDFs provide a superior method for estimating income distributions from binned data.
  • Matching estimates to a known mean is a critical step for enhancing the reliability of income statistics.
  • Software packages 'binsmooth' (R) and 'rpme' (Stata) are available for implementing these methods.