Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Variance01:15

Variance

9.3K
 The deviations show how spread out the data are about the mean. A positive deviation occurs when the data value exceeds the mean, whereas a negative deviation occurs when the data value is less than the mean. If the deviations are added, the sum is always zero. So one cannot simply add the deviations to get the data spread. By squaring the deviations, the numbers are made positive; thus, their sum will also be positive.
The standard deviation measures the spread in the same units as the...
9.3K
Estimating Population Mean with Known Standard Deviation01:16

Estimating Population Mean with Known Standard Deviation

8.3K
To construct a confidence interval for a single unknown population mean μ, where the population standard deviation is known, we need sample mean as an estimate for μ and we need the margin of error. Here, the margin of error (EBM) is called the error bound for a population mean (abbreviated EBM). The sample mean is the point estimate of the unknown population mean μ.
The confidence interval estimate will have the form as follows:
(point estimate - error bound, point estimate +...
8.3K
Estimating Population Mean with Unknown Standard Deviation01:22

Estimating Population Mean with Unknown Standard Deviation

7.6K
In practice, we rarely know the population standard deviation. In the past, when the sample size was large, this did not present a problem to statisticians. They used the sample standard deviation s as an estimate for σ and proceeded as before to calculate a confidence interval with close enough results. However, statisticians ran into problems when the sample size was small. A small sample size caused inaccuracies in the confidence interval.
William S. Gosset (1876–1937) of the...
7.6K
Correlation and Regression00:53

Correlation and Regression

1.2K
In statistics, correlation describes the degree of association between two variables. In the subfield of linear regression, correlation is mathematically expressed by the correlation coefficient, which describes the strength and direction of the relationship between two variables. The coefficient is symbolically represented by 'r' and ranges from -1 to +1. A positive value indicates a positive correlation where the two variables move in the same direction. A negative value suggests a...
1.2K
Empirical Method to Interpret Standard Deviation01:09

Empirical Method to Interpret Standard Deviation

5.1K
The empirical rule, also known as the three-sigma rule, allows a statistician to interpret the standard deviation in a normally distributed dataset. The rule states that 68% of the data lies within one standard deviation from the mean, 95% lies within two standard deviations from the mean, and 99.7% lies within three standard deviations from the mean. Additionally, this rule is also called the 68-95-99.7 rule.
This rule is used widely in statistics to calculate the proportion of data values...
5.1K
Variation01:19

Variation

6.7K
An important characteristic of any set of data is the variation in the data. In some data sets, the data values are concentrated closely near the mean; in other data sets, the data values are more widely spread out from the mean. The most common measure of variation, or spread, is the standard deviation, which is the square root of variance.
When independent and dependent variables are plotted on a scatter plot, the slope of a line is a value that describes the rate of change between the two...
6.7K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Unobserved heterogeneity in threshold regression based on the hitting times of a reflected Brownian motion for recurrent hypoglycemia.

Lifetime data analysis·2026
Same author

CALF-SBM: A covariate-assisted latent factor stochastic block model.

Physica A·2026
Same author

A nonparametric dependent competing risk method for net survival analysis.

The international journal of biostatistics·2026
Same author

Prediction of transition probabilities in multi-state models with nested case-control data.

Biometrics·2025
Same author

Dynamic prediction by landmarking with data from cohort subsampling designs.

Statistical methods in medical research·2025
Same author

Inverse Probability of Treatment Weighting Using the Propensity Score With Competing Risks in Survival Analysis.

Statistics in medicine·2025
Same journal

A Mixture of Distributed Lag Non-Linear Models to Account for Spatially Heterogeneous Exposure-Lag-Response Associations.

Statistics in medicine·2026
Same journal

Practical Considerations for Gaussian Process Modeling for Causal Inference in Quasi-Experimental Studies With Panel Data.

Statistics in medicine·2026
Same journal

Covariate Adjustment for Wilcoxon Two Sample Statistic and Test.

Statistics in medicine·2026
Same journal

Beyond Fixed Thresholds: Optimizing Summaries of Wearable Device Data via Piecewise Linearization of Quantile Functions.

Statistics in medicine·2026
Same journal

A Causal Framework for Evaluating the Total Effect of Strategies Aiming to Expand Screening and to Improve Outcomes.

Statistics in medicine·2026
Same journal

Causal Effects on Nonterminal Event Time With Application to Antibiotic Usage and Future Resistance.

Statistics in medicine·2026
See all related articles

Related Experiment Video

Updated: Jun 5, 2025

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.3K

On GEE for Mean-Variance-Correlation Models: Variance Estimation and Model Selection.

Zhenyu Xu1, Jason P Fine2, Wenling Song3

  • 1Department of Statistics, University of Connecticut, Storrs, Connecticut.

Statistics in Medicine
|December 12, 2024
PubMed
Summary
This summary is machine-generated.

Generalized estimating equations (GEE) analysis for clustered data is improved by a new method that correctly estimates variance and correlation. This approach enhances model selection for mean, variance, and correlation structures.

Keywords:
generalized estimating equationsmodel selection criterionsandwich estimatorworking covariance structure

More Related Videos

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits
08:27

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Published on: September 27, 2019

6.9K
An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.0K

Related Experiment Videos

Last Updated: Jun 5, 2025

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach
04:35

Development of an Individual-Tree Basal Area Increment Model using a Linear Mixed-Effects Approach

Published on: July 3, 2020

3.3K
Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits
08:27

Applying an eMASS Customization Program as a Research Tool to Evaluate Consumer Benefits

Published on: September 27, 2019

6.9K
An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.0K

Area of Science:

  • Statistics
  • Biostatistics
  • Econometrics

Background:

  • Generalized estimating equations (GEE) are crucial for analyzing clustered data without assuming full multivariate distributions.
  • Recent methods, like Luo and Pan's, jointly model mean, variance, and correlation.
  • These models are specific cases of Yan and Fine's more general estimating equations.

Purpose of the Study:

  • To address limitations in Luo and Pan's variance and correlation estimation for clustered data.
  • To introduce a novel model selection criterion for simultaneous mean-scale-correlation model selection.
  • To extend the geepack R package for enhanced covariance matrix flexibility.

Main Methods:

  • Characterizing model settings where Luo and Pan's variance estimators face challenges.
  • Illustrating how Yan and Fine's estimators correctly handle nested dependencies.
  • Developing and applying a new model selection criterion.
  • Utilizing sandwich variance estimators and simulation studies.

Main Results:

  • Identified specific scenarios where Luo and Pan's variance estimation approach is limited.
  • Demonstrated the effectiveness of Yan and Fine's estimators in accounting for dependencies.
  • Validated the proposed model selection criterion through simulations and real data.
  • Extended geepack with new options for working covariance matrices.

Conclusions:

  • The proposed methods offer improved variance estimation and model selection for clustered data analysis.
  • The enhanced geepack package provides greater flexibility in analyzing complex correlated data structures.
  • This work advances the application of generalized estimating equations in statistical modeling.