Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Kendall's Coefficient of Concordance01:20

Kendall's Coefficient of Concordance

542
Kendall's Coefficient of Concordance (W), also known as Kendall's W, is a non-parametric statistical measure used to assess the agreement or concordance between multiple raters or judges when they rank a set of items. It is often used when you have ordinal data (ranks) and you want to see if there is consistency or consensus among the raters. It is widely applied in research areas such as psychology, medicine, and social sciences, where multiple judges are asked to rank or rate subjects...
542
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

2.6K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
2.6K
Goodness-of-Fit Test01:16

Goodness-of-Fit Test

4.1K
The goodness-of-fit test is a type of hypothesis test which determines whether the data "fits" a particular distribution. For example, one may suspect that some anonymous data may fit a binomial distribution. A chi-square test (meaning the distribution for the hypothesis test is chi-square) can be used to determine if there is a fit. The null and alternative hypotheses may be written in sentences or stated as equations or inequalities. The test statistic for a goodness-of-fit test is given as...
4.1K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

2.1K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
2.1K
Spearman's Rank Correlation Test01:20

Spearman's Rank Correlation Test

1.0K
Spearman's rank correlation test, also known as Spearman's rho, is a nonparametric method for assessing the strength and direction of association between two variables. This test is particularly valuable when the data distribution is unknown or when the assumption of normality does not hold. Named after the English psychologist and statistician Dr. Charles Edward Spearman, it serves as the nonparametric counterpart to Pearson's correlation coefficient.
Spearman's test calculates...
1.0K
Kendall's Tau Test01:16

Kendall's Tau Test

836
Kendall's tau test, also known as the Kendall rank coefficient test, is a nonparametric method for assessing association between two variables. This test is particularly useful for identifying significant correlations when the distributions of the sample and population are unknown. Developed in 1938 by the British statistician Sir Maurice George Kendall, the tau coefficient (denoted as τ) serves as a rank correlation coefficient, with values ranging from -1 to +1.
A τ value...
836

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Analyzing childhood (0-59 months) malnutrition determinants in five West African countries of Gabon, Gambia, Liberia, Mauritania, and Nigeria using a generalized additive mixed model from DHS data.

Frontiers in nutrition·2025
Same author

Determinants of childhood anaemia in Nigeria: a public health perspective using quantile regression analysis.

BMC public health·2025
Same author

Corrigendum to "Greater cane rat algorithm (GCRA): A nature-inspired metaheuristic for optimization problems" [Heliyon Volume 10, Issue 11, June 2024, Article e31629].

Heliyon·2025
Same author

Correction: Artificial intelligence-driven translational medicine: a machine learning framework for predicting disease outcomes and optimizing patient-centric care.

Journal of translational medicine·2025
Same author

Determinants associated with anemia level among children under 5 years in Gambia: a structural equation modelling approach.

Frontiers in public health·2025
Same author

Quantile regression application to identify key determinants of malnutrition in five West African countries of Gabon, Gambia, Liberia, Mauritania, and Nigeria.

Frontiers in public health·2025
Same journal

Clinical crown height changes in mandibular anterior teeth retained with two types of fixed retainers over two years: findings from a randomized clinical trial.

Scientific reports·2026
Same journal

Rethinking water governance through indigenous systems: A comparative assessment of qanat and well irrigation productivity in Sabzevar County, Iran.

Scientific reports·2026
Same journal

Distributed Nash equilibrium seeking for second-order systems with finite/fixed-time convergence in the absence of velocity measurement.

Scientific reports·2026
Same journal

Determinants of pregnancy termination among ever-married women of reproductive age in Bangladesh.

Scientific reports·2026
Same journal

Occurrence and human health risk assessment of organochlorine pesticides in irrigated and non-irrigated agricultural soils of Wondogenet District, Ethiopia.

Scientific reports·2026
Same journal

High angular resolution diffusion imaging of neurodevelopment in children through data creation with deep learning.

Scientific reports·2026
See all related articles

Related Experiment Video

Updated: Sep 17, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

7.1K

Benchmarking validity indices for evolutionary K-means clustering performance.

Abiodun M Ikotun1, Faustin Habyarimana1, Absalom E Ezugwu2

  • 1School of Mathematics, Statistics and Computer Science, University of KwaZulu- Natal, KwaZulu-Natal, Pietermaritzburg Campus, Durban, South Africa.

Scientific Reports
|July 2, 2025
PubMed
Summary
This summary is machine-generated.

This study evaluated internal validity indices for Evolutionary K-Means clustering. The Calinski-Harabasz (CH) and Silhouette indices proved most effective for automatic clustering tasks.

Keywords:
Automatic clusteringCluster validity indicesClustering algorithmsEvolutionary k-meansK-meansMetaheuristic optimisation

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.6K
A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.6K

Related Experiment Videos

Last Updated: Sep 17, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

7.1K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.6K
A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.6K

Area of Science:

  • Data Science
  • Machine Learning
  • Artificial Intelligence

Background:

  • K-Means clustering requires a predefined number of clusters, limiting its use in automatic data analysis.
  • Evolutionary K-Means (E-KM) algorithms integrate metaheuristics to overcome K-Means limitations, using internal validity indices for automatic cluster determination.
  • The performance of internal validity indices is data-dependent, impacting the reliability of E-KM outcomes.

Purpose of the Study:

  • To evaluate the performance of fifteen internal validity indices within the Enhanced Firefly Algorithm-K-Means (FA-K-Means) framework.
  • To identify the most effective internal validity indices for automatic clustering tasks using an evolutionary approach.
  • To provide practical guidance on selecting fitness functions for E-KM algorithms.

Main Methods:

  • The study employed the Enhanced Firefly Algorithm-K-Means (FA-K-Means) framework, combining Firefly metaheuristics with K-Means.
  • Fifteen distinct internal validity indices were assessed as fitness functions within the FA-K-Means framework.
  • Performance was evaluated across a variety of real-life and synthetic datasets with diverse structural properties.

Main Results:

  • The Calinski-Harabasz (CH) index demonstrated consistently strong performance across various datasets.
  • The Silhouette index also showed robust and reliable performance in determining optimal clustering configurations.
  • Other evaluated indices exhibited variable effectiveness, often dependent on specific dataset characteristics.

Conclusions:

  • The Calinski-Harabasz (CH) and Silhouette indices are recommended for use as fitness functions in Evolutionary K-Means algorithms.
  • These indices offer more reliable and consistent clustering performance for automatic clustering tasks.
  • The findings offer practical insights for researchers and practitioners selecting validity indices in E-KM applications.