Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Coefficient of Correlation

Coefficient of Correlation

The correlation coefficient, r, developed by Karl Pearson in the early 1900s, is numerical and provides a measure of strength and direction of the linear association between the independent variable x and the dependent variable y.
If you suspect a linear relationship between x and y, then r can measure how strong the linear relationship is.
What the VALUE of r tells us:
The value of r is always between –1 and +1: –1 ≤ r ≤ 1.
The size of the correlation r indicates the strength of the linear...

Correlations

Correlations

Correlation means that there is a relationship between two or more variables (such as ice cream consumption and crime), but this relationship does not necessarily imply cause and effect. When two variables are correlated, it simply means that as one variable changes, so does the other. We can measure correlation by calculating a statistic known as a correlation coefficient. A correlation coefficient is a number from -1 to +1 that indicates the strength and direction of the relationship between...

How Data are Classified: Categorical Data

How Data are Classified: Categorical Data

A variable, usually notated by capital letters such as X and Y, is a characteristic or measurement that can be determined for each member of a population. Data are the actual values of variables. They may be numbers, or they may be words. Datum is a single value.
Data are classified based on whether they are measurable or not. Categorical data cannot be measured; instead, it can be divided into categories. For example, if Y denotes a person's party affiliation, some examples of Y include...

Correlation and Regression

Correlation and Regression

In statistics, correlation describes the degree of association between two variables. In the subfield of linear regression, correlation is mathematically expressed by the correlation coefficient, which describes the strength and direction of the relationship between two variables. The coefficient is symbolically represented by 'r' and ranges from -1 to +1. A positive value indicates a positive correlation where the two variables move in the same direction. A negative value suggests a negative...

Test for Homogeneity

Test for Homogeneity

The goodness–of–fit test can be used to decide whether a population fits a given distribution, but it will not suffice to decide whether two populations follow the same unknown distribution. A different test, called the test for homogeneity, can be used to conclude whether two populations have the same distribution. To calculate the test statistic for a test for homogeneity, follow the same procedure as with the test of independence. The hypotheses for the test for homogeneity can be stated as...

Calculating and Interpreting the Linear Correlation Coefficient

Calculating and Interpreting the Linear Correlation Coefficient

The correlation coefficient, r, developed by Karl Pearson in the early 1900s, is numerical and provides a measure of strength and direction of the linear association between the independent variable, x, and the dependent variable, y. Hence, it is also known as the Pearson product-moment correlation coefficient. It can be calculated using the following equation:

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

A copula based supervised filter for feature selection in machine learning driven diabetes risk prediction.

Scientific reports·2026

Same author

Pre-vaccination RT-PCR negative contacts in workplace settings show high, SARS COV-2 neutralizing antibody levels.

BMC public health·2022

Same journal

Handling skewness and directional tails in model-based clustering.

Statistical papers (Berlin, Germany)·2025

Same journal

Maximum likelihood estimation under the Emax model: existence, geometry and efficiency.

Statistical papers (Berlin, Germany)·2025

Same journal

Local linear smoothing for regression surfaces on the simplex using Dirichlet kernels.

Statistical papers (Berlin, Germany)·2025

Same journal

Statistical Inferences for Missing Response Problems Based on Modified Empirical Likelihood.

Statistical papers (Berlin, Germany)·2024

Same journal

On some problems of Bayesian region construction with guaranteed coverages.

Statistical papers (Berlin, Germany)·2024

Same journal

Osband's principle for identification functions.

Statistical papers (Berlin, Germany)·2024

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 13, 2026

CorrelationCalculator and Filigree: Tools for Data-Driven Network Analysis of Metabolomics Data

CorrelationCalculator and Filigree: Tools for Data-Driven Network Analysis of Metabolomics Data

Published on: November 10, 2023

Comparing Two Categorical Gini Correlations with Applications to Classification Problems.

Sameera Hewage¹, Yongli Sang²

¹Department of Physical Sciences & Mathematics, West Liberty University, West Liberty, WV 26074, USA.

Statistical Papers (Berlin, Germany)

|June 12, 2026

Summary

This summary is machine-generated.

This study introduces a new framework to compare predictor importance in classification using categorical Gini correlation (CGC). The method effectively evaluates numerical predictors for categorical outcomes, proving useful in real-world datasets.

Keywords:

62H15 62H20 breast cancer prediction categorical Gini correlation classification methods feature importance model-free inference

More Related Videos

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Related Experiment Videos

Last Updated: Jun 13, 2026

CorrelationCalculator and Filigree: Tools for Data-Driven Network Analysis of Metabolomics Data

CorrelationCalculator and Filigree: Tools for Data-Driven Network Analysis of Metabolomics Data

Published on: November 10, 2023

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Area of Science:

Statistics
Machine Learning
Data Science

Background:

Assessing predictor importance is crucial in classification tasks.
Existing methods may have limitations with categorical outcomes and complex predictor structures.
The categorical Gini correlation (CGC) offers a measure of dependence between numerical predictors and categorical outcomes.

Purpose of the Study:

To propose an inferential framework for comparing predictor importance in classification problems with categorical response variables.
To extend the application of the categorical Gini correlation (CGC) for hypothesis testing on predictor importance.
To provide a robust methodology that handles predictors of arbitrary dimensions and dependencies.

Main Methods:

The framework utilizes the categorical Gini correlation (CGC) to quantify predictor-outcome association.
Predictor importance is assessed by testing differences in CGCs between predictor groups.
Asymptotic normality of the test statistic is derived, and a nonparametric bootstrap procedure is developed for inference.

Main Results:

The proposed inferential framework effectively compares predictor importance for categorical outcomes.
The methodology accommodates predictors with arbitrary and unequal dimensions, and inter-predictor dependencies.
Theoretical properties including asymptotic normality and consistency are established for the test statistic.

Conclusions:

The developed framework provides a statistically sound method for evaluating predictor importance in classification with categorical variables.
The approach is validated through simulation studies and practical applications in breast cancer and human activity recognition.
This work offers a valuable tool for researchers and practitioners in machine learning and data analysis.