Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Detection of Gross Error: The Q Test01:00

Detection of Gross Error: The Q Test

6.8K
When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...
6.8K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

3.5K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
3.5K
Statistical Methods to Analyze Parametric Data: Student t-Test and Goodness-of-Fit Test01:09

Statistical Methods to Analyze Parametric Data: Student t-Test and Goodness-of-Fit Test

5.5K
In parametric statistics, two fundamental tests stand out for their utility and wide application: the Student's t-test and goodness-of-fit tests. These tests provide researchers with a robust method for drawing insights from data, testing hypotheses, and making informed decisions based on their findings.
The Student's t-test is a statistical test that examines if there is a statistically significant difference between the means of two groups. This test is instrumental when dealing with...
5.5K
Data Validation01:15

Data Validation

553
Method validation is a crucial process in analytical chemistry designed to confirm that a given method consistently produces reliable and high-quality results. This process is essential when a method is applied to different sample matrices or when procedural modifications are made, ensuring that the results meet acceptable standards across various applications.
Key parameters for method validation include:
553
Data Validation01:03

Data Validation

6.3K
Data validation is an essential part of a comprehensive assessment. Validation is confirming or verifying and opening the door to gathering more assessment data as it clarifies vague or unclear data. The process of checking and verifying the collected information is called data validation. The primary purpose of data validation is to ensure data is as free from error, bias, and misinterpretation as possible.
Nursing assessment guides are generally based on holistic models rather than medical...
6.3K
Goodness-of-Fit Test01:16

Goodness-of-Fit Test

8.1K
The goodness-of-fit test is a type of hypothesis test which determines whether the data "fits" a particular distribution. For example, one may suspect that some anonymous data may fit a binomial distribution. A chi-square test (meaning the distribution for the hypothesis test is chi-square) can be used to determine if there is a fit. The null and alternative hypotheses may be written in sentences or stated as equations or inequalities. The test statistic for a goodness-of-fit test is given as...
8.1K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Explore Thermal and Mechanical Properties of Biobased Polyurethane Elastomers Through Machine Learning Models.

Macromolecular rapid communications·2026
Same author

Tunable and Photomodifiable Nonisocyanate Polyurethanes from Lignin-Based Cyclic Carbonates Bearing α,β-Unsaturated Ketone.

ACS macro letters·2025
Same author

Nanofibrous Hyper-Cross-Linked Polymer Based on Veratraldehyde-Derived Triarylimidazole for Cationic Organic Pollutant Adsorption.

Biomacromolecules·2025
Same author

Castor Oil-Derived Ionic Liquids for Flexible, Antibacterial Biobased Thermosetting Polymers via Thiol-Ene Click Chemistry.

ACS macro letters·2025
Same author

Cellulose-Wool Keratin Composite Hydrogels as Selective Support Carriers for Gold Nanoparticles: Synthesis and Catalytic Applications in the Reduction of 4-Nitrophenol in Water.

Langmuir : the ACS journal of surfaces and colloids·2025
Same author

Enclose Biobased Content into Polyurethane Elastomers: A Summary of Synthetic Routes and an Inverse Prediction of their Percentages.

Macromolecular rapid communications·2025
Same journal

PFASGroups: An Open-Source Framework for Automated Identification, Structural Classification, and Prioritization of Per- and Polyfluoroalkyl Substances.

Journal of chemical information and modeling·2026
Same journal

DeepKbhb: Context-Aware Prediction of Human Lysine β-Hydroxybutyrylation Sites.

Journal of chemical information and modeling·2026
Same journal

HyperDC: A Non-Uniform Hypergraph Framework for Dual- and Higher-Order Drug Combination Recommendation Across Diverse Complex Diseases.

Journal of chemical information and modeling·2026
Same journal

Correction to "AstraMEV (AI-Guided Structural Assembly of Multi-Epitope Vaccines) Against Infectious Bronchitis Virus".

Journal of chemical information and modeling·2026
Same journal

MolPy: A Large Language Model-Friendly Toolkit for Reactive Topology Editing in Polymer Simulations.

Journal of chemical information and modeling·2026
Same journal

Molecular Mechanisms of KIT Receptor Dimerization and Oncogenic Activation Revealed by Multiscale Simulations.

Journal of chemical information and modeling·2026
See all related articles

Related Experiment Video

Updated: Jan 9, 2026

Design and Analysis for Fall Detection System Simplification
08:05

Design and Analysis for Fall Detection System Simplification

Published on: April 6, 2020

11.1K

DCC: A Model-Free Frame to Evaluate Data Set Quality.

Chunhui Xie1, Yunqi Li1

  • 1Department of Polymer Materials and Engineering, College of Materials and Metallurgy, Guizhou University, Guiyang 550025, P.R. China.

Journal of Chemical Information and Modeling
|December 9, 2025
PubMed
Summary
This summary is machine-generated.

We introduce Data Correlation Convergence (DCC), a novel framework to assess data set quality. DCC quantifies data stability under perturbations, offering a computationally efficient alternative to traditional methods for evaluating data completeness and representativeness.

More Related Videos

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.9K
A Quantitative Fitness Analysis Workflow
11:39

A Quantitative Fitness Analysis Workflow

Published on: August 13, 2012

14.9K

Related Experiment Videos

Last Updated: Jan 9, 2026

Design and Analysis for Fall Detection System Simplification
08:05

Design and Analysis for Fall Detection System Simplification

Published on: April 6, 2020

11.1K
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.9K
A Quantitative Fitness Analysis Workflow
11:39

A Quantitative Fitness Analysis Workflow

Published on: August 13, 2012

14.9K

Area of Science:

  • Data Science
  • Materials Science
  • Statistical Modeling

Background:

  • Evaluating data set quality is crucial for reliable analysis and model performance.
  • Existing methods for data quality assessment are often computationally intensive and model-dependent.
  • There is a need for a theoretically grounded and widely applicable framework to evaluate data completeness and representativeness.

Purpose of the Study:

  • To propose the Data Correlation Convergence (DCC) framework for evaluating data set quality.
  • To offer an alternative to conventional computation-intensive and model-dependent approaches.
  • To quantify data set stability under perturbations, reflecting completeness and representativeness.

Main Methods:

  • DCC integrates multiple correlation functions to quantify numeric correlations and distributional similarities.
  • The framework hypothesizes that high-quality data sets exhibit stable correlation patterns under perturbations.
  • Hypothetical and benchmark data sets were used to validate the DCC framework's efficacy.

Main Results:

  • The lowest DCC values were observed at 10-20% linear correlations, increasing with more determinative correlations.
  • DCC values effectively predict performance metrics (e.g., accuracy, R-squared) and feature importance (SHAP values) for machine learning models.
  • DCC can efficiently compress data sets by capturing inherent correlation patterns.

Conclusions:

  • The DCC framework provides a theoretically grounded, widely applicable, and extensible method for data set quality evaluation.
  • DCC offers insights into data completeness, representativeness, and potential biases.
  • This approach facilitates better data annotation and selection for scientific research and machine learning applications.