Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Data Validation

Data Validation

Method validation is a crucial process in analytical chemistry designed to confirm that a given method consistently produces reliable and high-quality results. This process is essential when a method is applied to different sample matrices or when procedural modifications are made, ensuring that the results meet acceptable standards across various applications.
Key parameters for method validation include:

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

Reliability and Validity

Reliability and Validity

Reliability and validity are two important considerations that must be made with any type of data collection. Reliability refers to the ability to consistently produce a given result. In the context of psychological research, this would mean that any instruments or tools used to collect data do so in consistent, reproducible ways.

Goodness-of-Fit Test

Goodness-of-Fit Test

The goodness-of-fit test is a type of hypothesis test which determines whether the data "fits" a particular distribution. For example, one may suspect that some anonymous data may fit a binomial distribution. A chi-square test (meaning the distribution for the hypothesis test is chi-square) can be used to determine if there is a fit. The null and alternative hypotheses may be written in sentences or stated as equations or inequalities. The test statistic for a goodness-of-fit test is given as...

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

ISilDR: Isometric Seriation-Based Dimensionality Reduction for Visual Cluster Analysis.

IEEE transactions on visualization and computer graphics·2026

Same author

Efficient and interpretable DNA/RNA representation using Komlós-Hadamard transforms.

BMC bioinformatics·2026

Same author

Dataset-Adaptive Dimensionality Reduction.

IEEE transactions on visualization and computer graphics·2025

Same author

Toward More Explainable Nonlinear Dimensionality Reduction: A Feature-Driven Interaction Approach.

IEEE transactions on visualization and computer graphics·2025

Same author

Distortion-Aware Brushing for Reliable Cluster Analysis in Multidimensional Projections.

IEEE transactions on visualization and computer graphics·2025

Same author

UMATO: Bridging Local and Global Structures for Reliable Visual Analytics With Dimensionality Reduction.

IEEE transactions on visualization and computer graphics·2025

Same journal

Relation DETR+: Exploring Explicit Position Relation Prior for Dense Prediction.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

CAFE: Cross-View Adaptive Fusion and Cluster Center Enhancement for Robust Multi-View Clustering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Ethics-Aware Safe Reinforcement Learning for Rare-Event Risk Control in Interactive Urban Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Learning Shape Anchors for Holistic Indoor Scene Understanding.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 24, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

Measuring the Validity of Clustering Validation Datasets.

Hyeon Jeon, Michael Aupetit, DongHwa Shin

IEEE Transactions on Pattern Analysis and Machine Intelligence

|March 4, 2025

Summary

This summary is machine-generated.

This study introduces Adjusted Internal Validation Measures (IVMs) to accurately assess how well dataset labels match true clusters. These new methods improve clustering validation across different datasets, enhancing benchmark reliability.

More Related Videos

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Automatic Image Processing to Determine the Community Size Structure of Riverine Macroinvertebrates

Automatic Image Processing to Determine the Community Size Structure of Riverine Macroinvertebrates

Published on: January 13, 2023

Related Experiment Videos

Last Updated: May 24, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Automatic Image Processing to Determine the Community Size Structure of Riverine Macroinvertebrates

Automatic Image Processing to Determine the Community Size Structure of Riverine Macroinvertebrates

Published on: January 13, 2023

Area of Science:

Data Science
Machine Learning
Statistical Analysis

Background:

Clustering validation often relies on benchmark datasets with predefined class labels.
Class labels may not accurately represent inherent data clusters, compromising validation accuracy.
Existing internal validation measures (IVMs) are limited to comparing cluster-label matching (CLM) within a single dataset.

Purpose of the Study:

To develop reliable methods for evaluating and comparing cluster-label matching (CLM) across diverse datasets.
To introduce Adjusted IVMs that are independent of dataset-specific properties unrelated to cluster structure.
To establish standardized protocols for converting existing IVMs into adjusted versions.

Main Methods:

Defined four axioms for validation measures, ensuring independence from data properties like dimensionality and size.
Developed standardized protocols to adapt any IVM to satisfy these axioms.
Applied protocols to adjust six widely used IVMs, creating Adjusted IVMs.
Conducted quantitative experiments to assess the performance of Adjusted IVMs.

Main Results:

Adjusted IVMs effectively evaluate and compare CLM both within and across datasets.
The proposed adjustment protocols are necessary and significantly improve validation accuracy.
Adjusted IVMs outperform standard IVMs and other competitors in assessing CLM.
The method allows for filtering and improving datasets to create more reliable clustering benchmarks.

Conclusions:

Adjusted IVMs provide a fast, reliable, and standardized approach for evaluating cluster-label matching across datasets.
This work enhances the reliability of benchmark datasets used for clustering validation.
The proposed methods offer a significant advancement in the field of unsupervised learning validation.