Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Coefficient of Correlation01:12

Coefficient of Correlation

The correlation coefficient, r, developed by Karl Pearson in the early 1900s, is numerical and provides a measure of strength and direction of the linear association between the independent variable x and the dependent variable y.
If you suspect a linear relationship between x and y, then r can measure how strong the linear relationship is.
What the VALUE of r tells us:
The value of r is always between –1 and +1: –1 ≤ r ≤ 1.
The size of the correlation r indicates the strength of the linear...
Correlations02:20

Correlations

Correlation means that there is a relationship between two or more variables (such as ice cream consumption and crime), but this relationship does not necessarily imply cause and effect. When two variables are correlated, it simply means that as one variable changes, so does the other. We can measure correlation by calculating a statistic known as a correlation coefficient. A correlation coefficient is a number from -1 to +1 that indicates the strength and direction of the relationship between...
How Data are Classified: Categorical Data01:11

How Data are Classified: Categorical Data

A variable, usually notated by capital letters such as X and Y, is a characteristic or measurement that can be determined for each member of a population. Data are the actual values of variables. They may be numbers, or they may be words. Datum is a single value.
Data are classified based on whether they are measurable or not. Categorical data cannot be measured; instead, it can be divided into categories. For example, if Y denotes a person's party affiliation, some examples of Y include...
Correlation and Regression00:53

Correlation and Regression

In statistics, correlation describes the degree of association between two variables. In the subfield of linear regression, correlation is mathematically expressed by the correlation coefficient, which describes the strength and direction of the relationship between two variables. The coefficient is symbolically represented by 'r' and ranges from -1 to +1. A positive value indicates a positive correlation where the two variables move in the same direction. A negative value suggests a negative...
Test for Homogeneity01:23

Test for Homogeneity

The goodness–of–fit test can be used to decide whether a population fits a given distribution, but it will not suffice to decide whether two populations follow the same unknown distribution. A different test, called the test for homogeneity, can be used to conclude whether two populations have the same distribution. To calculate the test statistic for a test for homogeneity, follow the same procedure as with the test of independence. The hypotheses for the test for homogeneity can be stated as...
Calculating and Interpreting the Linear Correlation Coefficient01:11

Calculating and Interpreting the Linear Correlation Coefficient

The correlation coefficient, r, developed by Karl Pearson in the early 1900s, is numerical and provides a measure of strength and direction of the linear association between the independent variable, x, and the dependent variable, y. Hence, it is also known as the Pearson product-moment correlation coefficient. It can be calculated using the following equation:

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

A copula based supervised filter for feature selection in machine learning driven diabetes risk prediction.

Scientific reports·2026
Same author

Pre-vaccination RT-PCR negative contacts in workplace settings show high, SARS COV-2 neutralizing antibody levels.

BMC public health·2022
Same journal

Handling skewness and directional tails in model-based clustering.

Statistical papers (Berlin, Germany)·2025
Same journal

Maximum likelihood estimation under the Emax model: existence, geometry and efficiency.

Statistical papers (Berlin, Germany)·2025
Same journal

Local linear smoothing for regression surfaces on the simplex using Dirichlet kernels.

Statistical papers (Berlin, Germany)·2025
Same journal

Statistical Inferences for Missing Response Problems Based on Modified Empirical Likelihood.

Statistical papers (Berlin, Germany)·2024
Same journal

On some problems of Bayesian region construction with guaranteed coverages.

Statistical papers (Berlin, Germany)·2024
Same journal

Osband's principle for identification functions.

Statistical papers (Berlin, Germany)·2024
See all related articles

Related Experiment Video

Updated: Jun 13, 2026

CorrelationCalculator and Filigree: Tools for Data-Driven Network Analysis of Metabolomics Data
07:11

CorrelationCalculator and Filigree: Tools for Data-Driven Network Analysis of Metabolomics Data

Published on: November 10, 2023

Comparing Two Categorical Gini Correlations with Applications to Classification Problems.

Sameera Hewage1, Yongli Sang2

  • 1Department of Physical Sciences & Mathematics, West Liberty University, West Liberty, WV 26074, USA.

Statistical Papers (Berlin, Germany)
|June 12, 2026
PubMed
Summary
This summary is machine-generated.

This study introduces a new framework to compare predictor importance in classification using categorical Gini correlation (CGC). The method effectively evaluates numerical predictors for categorical outcomes, proving useful in real-world datasets.

Keywords:
62H1562H20breast cancer predictioncategorical Gini correlationclassification methodsfeature importancemodel-free inference

More Related Videos

An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Related Experiment Videos

Last Updated: Jun 13, 2026

CorrelationCalculator and Filigree: Tools for Data-Driven Network Analysis of Metabolomics Data
07:11

CorrelationCalculator and Filigree: Tools for Data-Driven Network Analysis of Metabolomics Data

Published on: November 10, 2023

An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Area of Science:

  • Statistics
  • Machine Learning
  • Data Science

Background:

  • Assessing predictor importance is crucial in classification tasks.
  • Existing methods may have limitations with categorical outcomes and complex predictor structures.
  • The categorical Gini correlation (CGC) offers a measure of dependence between numerical predictors and categorical outcomes.

Purpose of the Study:

  • To propose an inferential framework for comparing predictor importance in classification problems with categorical response variables.
  • To extend the application of the categorical Gini correlation (CGC) for hypothesis testing on predictor importance.
  • To provide a robust methodology that handles predictors of arbitrary dimensions and dependencies.

Main Methods:

  • The framework utilizes the categorical Gini correlation (CGC) to quantify predictor-outcome association.
  • Predictor importance is assessed by testing differences in CGCs between predictor groups.
  • Asymptotic normality of the test statistic is derived, and a nonparametric bootstrap procedure is developed for inference.

Main Results:

  • The proposed inferential framework effectively compares predictor importance for categorical outcomes.
  • The methodology accommodates predictors with arbitrary and unequal dimensions, and inter-predictor dependencies.
  • Theoretical properties including asymptotic normality and consistency are established for the test statistic.

Conclusions:

  • The developed framework provides a statistically sound method for evaluating predictor importance in classification with categorical variables.
  • The approach is validated through simulation studies and practical applications in breast cancer and human activity recognition.
  • This work offers a valuable tool for researchers and practitioners in machine learning and data analysis.