Search research articles

关于 JoVE

概览领导团队博客 JoVE 帮助中心

作者

出版流程编辑委员会范围与政策同行评审常见问题投稿

图书馆员

用户评价订阅访问资源图书馆顾问委员会常见问题

研究

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments 存档

教育

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual 教师资源中心教师网站

使用条款与条件

相关概念视频

Detection of Gross Error: The Q Test

Detection of Gross Error: The Q Test

When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Statistical Methods to Analyze Parametric Data: Student t-Test and Goodness-of-Fit Test

Statistical Methods to Analyze Parametric Data: Student t-Test and Goodness-of-Fit Test

In parametric statistics, two fundamental tests stand out for their utility and wide application: the Student's t-test and goodness-of-fit tests. These tests provide researchers with a robust method for drawing insights from data, testing hypotheses, and making informed decisions based on their findings.
The Student's t-test is a statistical test that examines if there is a statistically significant difference between the means of two groups. This test is instrumental when dealing with...

Data Validation

Data Validation

Method validation is a crucial process in analytical chemistry designed to confirm that a given method consistently produces reliable and high-quality results. This process is essential when a method is applied to different sample matrices or when procedural modifications are made, ensuring that the results meet acceptable standards across various applications.
Key parameters for method validation include:

Data Validation

Data Validation

Data validation is an essential part of a comprehensive assessment. Validation is confirming or verifying and opening the door to gathering more assessment data as it clarifies vague or unclear data. The process of checking and verifying the collected information is called data validation. The primary purpose of data validation is to ensure data is as free from error, bias, and misinterpretation as possible.
Nursing assessment guides are generally based on holistic models rather than medical...

Goodness-of-Fit Test

Goodness-of-Fit Test

The goodness-of-fit test is a type of hypothesis test which determines whether the data "fits" a particular distribution. For example, one may suspect that some anonymous data may fit a binomial distribution. A chi-square test (meaning the distribution for the hypothesis test is chi-square) can be used to determine if there is a fit. The null and alternative hypotheses may be written in sentences or stated as equations or inequalities. The test statistic for a goodness-of-fit test is given as...

您也可能阅读

相关文章

通过共同作者、期刊和引用图与本文相关的文章。

排序

Same author

Explore Thermal and Mechanical Properties of Biobased Polyurethane Elastomers Through Machine Learning Models.

Macromolecular rapid communications·2026

Same author

Tunable and Photomodifiable Nonisocyanate Polyurethanes from Lignin-Based Cyclic Carbonates Bearing α,β-Unsaturated Ketone.

ACS macro letters·2025

Same author

Castor Oil-Derived Ionic Liquids for Flexible, Antibacterial Biobased Thermosetting Polymers via Thiol-Ene Click Chemistry.

ACS macro letters·2025

Same author

Nanofibrous Hyper-Cross-Linked Polymer Based on Veratraldehyde-Derived Triarylimidazole for Cationic Organic Pollutant Adsorption.

Biomacromolecules·2025

Same author

Cellulose-Wool Keratin Composite Hydrogels as Selective Support Carriers for Gold Nanoparticles: Synthesis and Catalytic Applications in the Reduction of 4-Nitrophenol in Water.

Langmuir : the ACS journal of surfaces and colloids·2025

Same author

Enclose Biobased Content into Polyurethane Elastomers: A Summary of Synthetic Routes and an Inverse Prediction of their Percentages.

Macromolecular rapid communications·2025

Same journal

MolPy: A Large Language Model-Friendly Toolkit for Reactive Topology Editing in Polymer Simulations.

Journal of chemical information and modeling·2026

Same journal

Molecular Mechanisms of KIT Receptor Dimerization and Oncogenic Activation Revealed by Multiscale Simulations.

Journal of chemical information and modeling·2026

Same journal

Structural and Thermodynamic Discrimination between Agonists and Antagonists of Retinoic Acid Receptor γ and the Vitamin D Receptor.

Journal of chemical information and modeling·2026

Same journal

PACEff Builder: An Efficient Platform for Constructing PACE Hybrid-Resolution Models for Molecular Dynamics Simulations of Aqueous Protein, Peptide Assembly, and Membrane Protein Systems.

Journal of chemical information and modeling·2026

Same journal

TransKla: A Local-Global Cross-Attention Based Transformer Approach for Prediction of Lysine Lactylation Sites.

Journal of chemical information and modeling·2026

Same journal

CondenSimAdapter: A Versatile Builder for Multiscale Simulations of Protein Condensates with Broad Force-Field Compatibility and Robust Dense-Phase Relaxation.

Journal of chemical information and modeling·2026

查看所有相关文章

Search research articles

相关实验视频

Updated: Jan 9, 2026

Design and Analysis for Fall Detection System Simplification

Design and Analysis for Fall Detection System Simplification

Published on: April 6, 2020

DCC:一个无模型的框架来评估数据集质量.

Chunhui Xie¹, Yunqi Li¹

¹Department of Polymer Materials and Engineering, College of Materials and Metallurgy, Guizhou University, Guiyang 550025, P.R. China.

Journal of chemical information and modeling

|December 9, 2025

概括

此摘要是机器生成的。

我们介绍了数据相关性收 (DCC),这是评估数据集质量的新框架. DCC量化了扰动下的数据稳定性,为评估数据完整性和代表性提供了传统方法的计算效率高的替代方案.

更多相关视频

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

A Quantitative Fitness Analysis Workflow

A Quantitative Fitness Analysis Workflow

Published on: August 13, 2012

相关实验视频

Last Updated: Jan 9, 2026

Design and Analysis for Fall Detection System Simplification

Design and Analysis for Fall Detection System Simplification

Published on: April 6, 2020

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

A Quantitative Fitness Analysis Workflow

A Quantitative Fitness Analysis Workflow

Published on: August 13, 2012

科学领域:

数据科学数据科学数据科学
材料科学材料科学材料科学
统计建模统计建模

背景情况:

评估数据集质量对于可靠的分析和模型性能至关重要.
现有的数据质量评估方法往往是计算密集型和模型依赖的.
需要一个理论上的基础和广泛适用的框架来评估数据的完整性和代表性.

研究的目的:

为评估数据集质量提出数据相关性趋同 (DCC) 框架.
为传统的计算密集型和依赖模型的方法提供替代方案.
量化数据集在扰动下的稳定性,反映完整性和代表性.

主要方法:

DCC集成了多个相关函数来量化数值相关性和分布相似性.
该框架假设高质量的数据集在扰乱下表现出稳定的相关性模式.
用假设和基准数据集来验证DCC框架的有效性.

主要成果:

最低的DCC值在10-20%的线性相关性中观察到,随着更具决定性的相关性而增加.
DCC值有效地预测机器学习模型的性能指标 (例如,精度,R平方) 和特征重要性 (SHAP值).
通过捕捉固有的相关性模式,DCC可以有效地压缩数据集.

结论:

DCC框架为数据集质量评估提供了一个理论上有根据的,广泛适用的和可扩展的方法.
DCC提供了关于数据完整性,代表性和潜在偏差的见解.
这种方法可以为科学研究和机器学习应用提供更好的数据注释和选择.