Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Uncertainty: Confidence Intervals

Uncertainty: Confidence Intervals

The confidence interval is the range of values around the mean that contains the true mean. It is expressed as a probability percentage. The interpretation of a 95% confidence interval, for instance, is that the statistician is 95% confident that the true mean falls within the interval. The upper and lower limits of this range are known as confidence limits. The confidence limits for the true mean are estimated from the sample's mean, the standard deviation, and the statistical factor...

Relative Risk

Relative Risk

Relative risk (RR) is a statistical measure commonly used in epidemiology to compare the likelihood of a particular event occurring between two groups. This metric is important for evaluating the relationship between exposure to a specific risk factor and the probability of a particular outcome. It plays a crucial role in medical research, public health studies, and risk assessment. Relative risk quantifies how much more (or less) likely an event is to occur in an exposed group compared to an...

Mechanistic Models: Compartment Models in Individual and Population Analysis

Mechanistic Models: Compartment Models in Individual and Population Analysis

Mechanistic models are utilized in individual analysis using single-source data, but imperfections arise due to data collection errors, preventing perfect prediction of observed data. The mathematical equation involves known values (Xi), observed concentrations (Ci), measurement errors (εi), model parameters (ϕj), and the related function (ƒi) for i number of values. Different least-squares metrics quantify differences between predicted and observed values. The ordinary least...

Prediction Intervals

Prediction Intervals

The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y.

Confidence Intervals

Confidence Intervals

An unbiased point estimate is often insufficient to predict a population estimate, such as population mean or population proportion. In this scenario, a confidence interval is used. A confidence interval is an estimate similar to a sample proportion. However, unlike the point estimate which is a single value, the confidence interval contains a range of values. These values have lower and upper limits, known as confidence limits, and can be designated as L1 and L2, respectively.
A confidence...

Interpretation of Confidence Intervals

Interpretation of Confidence Intervals

A confidence interval is a better estimate of the population than a point estimate, as it uses a range of values from a sample instead of a single value.
Confidence intervals have confidence coefficients that are crucial for their interpretation. The most common confidence coefficients are 0.90, 0.95, and 0.99, which can be written as percentages–90%, 95%, and 99%, respectively.
Suppose a person calculates a confidence interval with a confidence coefficient of 0.95. In that case, they can...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Outcome-Assisted Multiple Imputation of Missing Treatments.

Observational studies·2026

Same author

Optimal <i>F</i>-score Matching for Bipartite Record Linkage.

Statistics and computing·2026

Same author

Fully Synthetic Data for Complex Surveys.

Survey methodology·2025

Same author

Studying Chinese immigrants' spatial distribution in the Raleigh-Durham area by linking survey and commercial data using romanized names.

Journal of the Royal Statistical Society. Series A, (Statistics in Society)·2025

Same author

Evaluating Binary Outcome Classifiers Estimated from Survey Data.

Epidemiology (Cambridge, Mass.)·2024

Same author

The association between long-term PM2.5 exposure and risk for pancreatic cancer: an application of social informatics.

American journal of epidemiology·2024

Same journal

Instrumental Variable Estimation of Marginal Structural Mean Models for Time-Varying Treatment.

Journal of the American Statistical Association·2026

Same journal

Semiparametric Joint Modeling for Survival Analysis with Longitudinal Covariates.

Journal of the American Statistical Association·2026

Same journal

Dimension Reduction for Large-Scale Federated Data: Statistical Rate and Asymptotic Inference.

Journal of the American Statistical Association·2026

Same journal

Facilitating Heterogeneous Effect Estimation via Statistically Efficient Categorical Modifiers.

Journal of the American Statistical Association·2026

Same journal

Nonparametric Density Estimation of a Long-Term Trend from Repeated Semicontinuous Data.

Journal of the American Statistical Association·2026

Same journal

Functional Integrative Bayesian Analysis of High-dimensional Multiplatform Clinicogenomic Data.

Journal of the American Statistical Association·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Apr 24, 2026

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Estimating Identification Disclosure Risk Using Mixed Membership Models.

Daniel Manrique-Vallier¹, Jerome P Reiter²

¹Postdoctoral Associate at the Social Science Research Institute and the Department of Statistical Science, Duke University, Durham, NC 27708-0251.

Journal of the American Statistical Association

|September 13, 2014

Summary

This summary is machine-generated.

Statistical agencies must protect data confidentiality. Bayesian Grade of Membership (GoM) models offer more accurate disclosure risk assessments for sparse data than traditional log-linear models.

Keywords:

Confidentiality Contingency table Disclosure Grade of membership Latent class

More Related Videos

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Related Experiment Videos

Last Updated: Apr 24, 2026

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Area of Science:

Statistics
Data Privacy
Computational Statistics

Background:

Statistical agencies must protect data confidentiality to prevent re-identification.
Disclosure risk assessments often involve estimating the probability of unique records based on key variables.
Log-linear models are commonly used but can be biased with sparse data.

Purpose of the Study:

To propose an alternative to log-linear models for disclosure risk assessment in sparse datasets.
To introduce a Bayesian Grade of Membership (GoM) model for multinomial variables.
To evaluate the accuracy of Bayesian GoM models compared to log-linear models.

Main Methods:

Developed a Bayesian Grade of Membership (GoM) model for multinomial variables.
Implemented an MCMC algorithm for model fitting.
Evaluated model performance using US Census Bureau microdata samples.

Main Results:

Bayesian GoM models provide more accurate estimates of the total number of unique records in samples compared to log-linear models.
GoM models yield superior record-level predictions of uniqueness.
The proposed method mitigates bias issues inherent in log-linear models with sparse contingency tables.

Conclusions:

Bayesian GoM models are a more reliable alternative for disclosure risk assessment with sparse data.
This approach enhances the accuracy of confidentiality protection for statistical agencies.
The study demonstrates the practical utility of Bayesian GoM models in real-world data privacy scenarios.