Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Detection of Gross Error: The Q Test

Detection of Gross Error: The Q Test

When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Statistical Methods to Analyze Parametric Data: Student t-Test and Goodness-of-Fit Test

Statistical Methods to Analyze Parametric Data: Student t-Test and Goodness-of-Fit Test

In parametric statistics, two fundamental tests stand out for their utility and wide application: the Student's t-test and goodness-of-fit tests. These tests provide researchers with a robust method for drawing insights from data, testing hypotheses, and making informed decisions based on their findings.
The Student's t-test is a statistical test that examines if there is a statistically significant difference between the means of two groups. This test is instrumental when dealing with...

Data Validation

Data Validation

Method validation is a crucial process in analytical chemistry designed to confirm that a given method consistently produces reliable and high-quality results. This process is essential when a method is applied to different sample matrices or when procedural modifications are made, ensuring that the results meet acceptable standards across various applications.
Key parameters for method validation include:

Data Validation

Data Validation

Data validation is an essential part of a comprehensive assessment. Validation is confirming or verifying and opening the door to gathering more assessment data as it clarifies vague or unclear data. The process of checking and verifying the collected information is called data validation. The primary purpose of data validation is to ensure data is as free from error, bias, and misinterpretation as possible.
Nursing assessment guides are generally based on holistic models rather than medical...

Goodness-of-Fit Test

Goodness-of-Fit Test

The goodness-of-fit test is a type of hypothesis test which determines whether the data "fits" a particular distribution. For example, one may suspect that some anonymous data may fit a binomial distribution. A chi-square test (meaning the distribution for the hypothesis test is chi-square) can be used to determine if there is a fit. The null and alternative hypotheses may be written in sentences or stated as equations or inequalities. The test statistic for a goodness-of-fit test is given as...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Explore Thermal and Mechanical Properties of Biobased Polyurethane Elastomers Through Machine Learning Models.

Macromolecular rapid communications·2026

Same author

Tunable and Photomodifiable Nonisocyanate Polyurethanes from Lignin-Based Cyclic Carbonates Bearing α,β-Unsaturated Ketone.

ACS macro letters·2025

Same author

Nanofibrous Hyper-Cross-Linked Polymer Based on Veratraldehyde-Derived Triarylimidazole for Cationic Organic Pollutant Adsorption.

Biomacromolecules·2025

Same author

Castor Oil-Derived Ionic Liquids for Flexible, Antibacterial Biobased Thermosetting Polymers via Thiol-Ene Click Chemistry.

ACS macro letters·2025

Same author

Cellulose-Wool Keratin Composite Hydrogels as Selective Support Carriers for Gold Nanoparticles: Synthesis and Catalytic Applications in the Reduction of 4-Nitrophenol in Water.

Langmuir : the ACS journal of surfaces and colloids·2025

Same author

Enclose Biobased Content into Polyurethane Elastomers: A Summary of Synthetic Routes and an Inverse Prediction of their Percentages.

Macromolecular rapid communications·2025

Same journal

PFASGroups: An Open-Source Framework for Automated Identification, Structural Classification, and Prioritization of Per- and Polyfluoroalkyl Substances.

Journal of chemical information and modeling·2026

Same journal

DeepKbhb: Context-Aware Prediction of Human Lysine β-Hydroxybutyrylation Sites.

Journal of chemical information and modeling·2026

Same journal

HyperDC: A Non-Uniform Hypergraph Framework for Dual- and Higher-Order Drug Combination Recommendation Across Diverse Complex Diseases.

Journal of chemical information and modeling·2026

Same journal

Correction to "AstraMEV (AI-Guided Structural Assembly of Multi-Epitope Vaccines) Against Infectious Bronchitis Virus".

Journal of chemical information and modeling·2026

Same journal

MolPy: A Large Language Model-Friendly Toolkit for Reactive Topology Editing in Polymer Simulations.

Journal of chemical information and modeling·2026

Same journal

Molecular Mechanisms of KIT Receptor Dimerization and Oncogenic Activation Revealed by Multiscale Simulations.

Journal of chemical information and modeling·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 9, 2026

Design and Analysis for Fall Detection System Simplification

Design and Analysis for Fall Detection System Simplification

Published on: April 6, 2020

DCC: A Model-Free Frame to Evaluate Data Set Quality.

Chunhui Xie¹, Yunqi Li¹

¹Department of Polymer Materials and Engineering, College of Materials and Metallurgy, Guizhou University, Guiyang 550025, P.R. China.

Journal of Chemical Information and Modeling

|December 9, 2025

Summary

This summary is machine-generated.

We introduce Data Correlation Convergence (DCC), a novel framework to assess data set quality. DCC quantifies data stability under perturbations, offering a computationally efficient alternative to traditional methods for evaluating data completeness and representativeness.

More Related Videos

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

A Quantitative Fitness Analysis Workflow

A Quantitative Fitness Analysis Workflow

Published on: August 13, 2012

Related Experiment Videos

Last Updated: Jan 9, 2026

Design and Analysis for Fall Detection System Simplification

Design and Analysis for Fall Detection System Simplification

Published on: April 6, 2020

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

A Quantitative Fitness Analysis Workflow

A Quantitative Fitness Analysis Workflow

Published on: August 13, 2012

Area of Science:

Data Science
Materials Science
Statistical Modeling

Background:

Evaluating data set quality is crucial for reliable analysis and model performance.
Existing methods for data quality assessment are often computationally intensive and model-dependent.
There is a need for a theoretically grounded and widely applicable framework to evaluate data completeness and representativeness.

Purpose of the Study:

To propose the Data Correlation Convergence (DCC) framework for evaluating data set quality.
To offer an alternative to conventional computation-intensive and model-dependent approaches.
To quantify data set stability under perturbations, reflecting completeness and representativeness.

Main Methods:

DCC integrates multiple correlation functions to quantify numeric correlations and distributional similarities.
The framework hypothesizes that high-quality data sets exhibit stable correlation patterns under perturbations.
Hypothetical and benchmark data sets were used to validate the DCC framework's efficacy.

Main Results:

The lowest DCC values were observed at 10-20% linear correlations, increasing with more determinative correlations.
DCC values effectively predict performance metrics (e.g., accuracy, R-squared) and feature importance (SHAP values) for machine learning models.
DCC can efficiently compress data sets by capturing inherent correlation patterns.

Conclusions:

The DCC framework provides a theoretically grounded, widely applicable, and extensible method for data set quality evaluation.
DCC offers insights into data completeness, representativeness, and potential biases.
This approach facilitates better data annotation and selection for scientific research and machine learning applications.