Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Variance

Variance

The deviations show how spread out the data are about the mean. A positive deviation occurs when the data value exceeds the mean, whereas a negative deviation occurs when the data value is less than the mean. If the deviations are added, the sum is always zero. So one cannot simply add the deviations to get the data spread. By squaring the deviations, the numbers are made positive; thus, their sum will also be positive.
The standard deviation measures the spread in the same units as the...

Estimating Population Mean with Known Standard Deviation

Estimating Population Mean with Known Standard Deviation

To construct a confidence interval for a single unknown population mean μ, where the population standard deviation is known, we need sample mean as an estimate for μ and we need the margin of error. Here, the margin of error (EBM) is called the error bound for a population mean (abbreviated EBM). The sample mean is the point estimate of the unknown population mean μ.
The confidence interval estimate will have the form as follows:
(point estimate - error bound, point estimate +...

Estimating Population Mean with Unknown Standard Deviation

Estimating Population Mean with Unknown Standard Deviation

In practice, we rarely know the population standard deviation. In the past, when the sample size was large, this did not present a problem to statisticians. They used the sample standard deviation s as an estimate for σ and proceeded as before to calculate a confidence interval with close enough results. However, statisticians ran into problems when the sample size was small. A small sample size caused inaccuracies in the confidence interval.
William S. Gosset (1876–1937) of the...

Correlation and Regression

Correlation and Regression

In statistics, correlation describes the degree of association between two variables. In the subfield of linear regression, correlation is mathematically expressed by the correlation coefficient, which describes the strength and direction of the relationship between two variables. The coefficient is symbolically represented by 'r' and ranges from -1 to +1. A positive value indicates a positive correlation where the two variables move in the same direction. A negative value suggests a...

Empirical Method to Interpret Standard Deviation

Empirical Method to Interpret Standard Deviation

The empirical rule, also known as the three-sigma rule, allows a statistician to interpret the standard deviation in a normally distributed dataset. The rule states that 68% of the data lies within one standard deviation from the mean, 95% lies within two standard deviations from the mean, and 99.7% lies within three standard deviations from the mean. Additionally, this rule is also called the 68-95-99.7 rule.
This rule is used widely in statistics to calculate the proportion of data values...

Variation

Variation

An important characteristic of any set of data is the variation in the data. In some data sets, the data values are concentrated closely near the mean; in other data sets, the data values are more widely spread out from the mean. The most common measure of variation, or spread, is the standard deviation, which is the square root of variance.
When independent and dependent variables are plotted on a scatter plot, the slope of a line is a value that describes the rate of change between the two...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Unobserved heterogeneity in threshold regression based on the hitting times of a reflected Brownian motion for recurrent hypoglycemia.

Lifetime data analysis·2026

Same author

CALF-SBM: A covariate-assisted latent factor stochastic block model.

Physica A·2026

Same author

A nonparametric dependent competing risk method for net survival analysis.

The international journal of biostatistics·2026

Same author

Prediction of transition probabilities in multi-state models with nested case-control data.

Biometrics·2025

Same author

Dynamic prediction by landmarking with data from cohort subsampling designs.

Statistical methods in medical research·2025

Same author

Inverse Probability of Treatment Weighting Using the Propensity Score With Competing Risks in Survival Analysis.

Statistics in medicine·2025

Same journal

A Mixture of Distributed Lag Non-Linear Models to Account for Spatially Heterogeneous Exposure-Lag-Response Associations.

Statistics in medicine·2026

Same journal

Practical Considerations for Gaussian Process Modeling for Causal Inference in Quasi-Experimental Studies With Panel Data.

Statistics in medicine·2026

Same journal

Covariate Adjustment for Wilcoxon Two Sample Statistic and Test.

Statistics in medicine·2026

Same journal

Beyond Fixed Thresholds: Optimizing Summaries of Wearable Device Data via Piecewise Linearization of Quantile Functions.

Statistics in medicine·2026

Same journal

A Causal Framework for Evaluating the Total Effect of Strategies Aiming to Expand Screening and to Improve Outcomes.

Statistics in medicine·2026

Same journal

Causal Effects on Nonterminal Event Time With Application to Antibiotic Usage and Future Resistance.

Statistics in medicine·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 5, 2025

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

On GEE for Mean-Variance-Correlation Models: Variance Estimation and Model Selection.

Zhenyu Xu¹, Jason P Fine², Wenling Song³

¹Department of Statistics, University of Connecticut, Storrs, Connecticut.

Statistics in Medicine

|December 12, 2024

Summary

This summary is machine-generated.

Generalized estimating equations (GEE) analysis for clustered data is improved by a new method that correctly estimates variance and correlation. This approach enhances model selection for mean, variance, and correlation structures.

Keywords:

generalized estimating equations model selection criterion sandwich estimator working covariance structure

More Related Videos

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Published on: September 27, 2019

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Related Experiment Videos

Last Updated: Jun 5, 2025

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Published on: September 27, 2019

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Area of Science:

Statistics
Biostatistics
Econometrics

Background:

Generalized estimating equations (GEE) are crucial for analyzing clustered data without assuming full multivariate distributions.
Recent methods, like Luo and Pan's, jointly model mean, variance, and correlation.
These models are specific cases of Yan and Fine's more general estimating equations.

Purpose of the Study:

To address limitations in Luo and Pan's variance and correlation estimation for clustered data.
To introduce a novel model selection criterion for simultaneous mean-scale-correlation model selection.
To extend the geepack R package for enhanced covariance matrix flexibility.

Main Methods:

Characterizing model settings where Luo and Pan's variance estimators face challenges.
Illustrating how Yan and Fine's estimators correctly handle nested dependencies.
Developing and applying a new model selection criterion.
Utilizing sandwich variance estimators and simulation studies.

Main Results:

Identified specific scenarios where Luo and Pan's variance estimation approach is limited.
Demonstrated the effectiveness of Yan and Fine's estimators in accounting for dependencies.
Validated the proposed model selection criterion through simulations and real data.
Extended geepack with new options for working covariance matrices.

Conclusions:

The proposed methods offer improved variance estimation and model selection for clustered data analysis.
The enhanced geepack package provides greater flexibility in analyzing complex correlated data structures.
This work advances the application of generalized estimating equations in statistical modeling.