Search research articles

关于 JoVE

概览领导团队博客 JoVE 帮助中心

作者

出版流程编辑委员会范围与政策同行评审常见问题投稿

图书馆员

用户评价订阅访问资源图书馆顾问委员会常见问题

研究

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments 存档

教育

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual 教师资源中心教师网站

使用条款与条件

相关概念视频

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a...

Bias

Bias

Bias refers to any tendency that prevents a question from being considered unprejudiced. In research, bias occurs when one outcome or answer is selected or encouraged over others in sampling or testing. Bias can occur during any research phase, including study design, data collection, analysis, and publication.
In statistics, a sampling bias is created when a sample is collected from a population, and some members of the population are not as likely to be chosen as others (remember, each member...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

One-Way ANOVA: Unequal Sample Sizes

One-Way ANOVA: Unequal Sample Sizes

One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:

Outliers and Influential Points

Outliers and Influential Points

An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA: Equal Sample Sizes

One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...

您也可能阅读

相关文章

通过共同作者、期刊和引用图与本文相关的文章。

排序

Same author

End-to-end evaluation of pipelines for metagenome-assembled genomes reveals hidden performance gaps.

bioRxiv : the preprint server for biology·2026

Same author

A generalizable cross-continent prediction of esophageal squamous cell carcinoma using the oral microbiome.

Communications medicine·2026

Same author

Comparative metagenomics using pan-metagenomic graphs.

bioRxiv : the preprint server for biology·2025

Same author

A generalizable cross-continent prediction of esophageal squamous cell carcinoma using the oral microbiome.

bioRxiv : the preprint server for biology·2025

Same author

Transcriptomic Plasticity Is a Hallmark of Metastatic Pancreatic Cancer.

Cancer research·2025

Same author

Identification of Sample Processing Errors in Microbiome Studies Using Host Genetic Profiles.

bioRxiv : the preprint server for biology·2025

Same journal

Erratum for the Research Article "Assessing the health risks of rice cadmium content standards in China" by H. Chu <i>et al</i>.

Science advances·2026

Same journal

Erratum for the Research Article "Developmental regulation of Erk signaling by mitotic kinases" by F. Chen <i>et al</i>.

Science advances·2026

Same journal

Magnetically levitated metasurface enabling tangible and bidirectional human-machine interaction.

Science advances·2026

Same journal

A general photoinduced manganese-catalyzed platform for the sequential difunctionalization of [1.1.1]propellane.

Science advances·2026

Same journal

Turning sound and force into light with AlN:Mn<sup>2+</sup> mechanoluminescence.

Science advances·2026

Same journal

Extreme dominance of Earth-origin heavy ions in the intense ring current near the Earth during the May 2024 super geomagnetic storm.

Science advances·2026

查看所有相关文章

Search research articles

相关实验视频

Updated: Jan 10, 2026

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

分布偏差妥协-一个离开-一个离开-交叉验证.

George I Austin^1,2, Itsik Pe'er^2,3, Tal Korem^2,4

¹Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA.

Science advances

|November 28, 2025

概括

此摘要是机器生成的。

交叉验证可以引入"分布偏差",对机器学习模型评估产生负面影响. 一种新的再平衡交叉验证方法纠正了这种偏差,改善了在各种机器学习任务中的性能评估.

更多相关视频

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

相关实验视频

Last Updated: Jan 10, 2026

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

科学领域:

机器学习机器学习
统计建模统计建模
数据科学数据科学数据科学

背景情况:

交叉验证是评估机器学习模型概括的标准技术.
在低数据场景中,经常采用离开一次的交叉验证 (LOOCV).
在LOOCV折叠中汇总预测是性能指标的常见做法.

研究的目的:

在聚合交叉验证中识别和数学证明"分布偏差"的存在.
为了证明分布偏差对模型评估和超参数调整的负面影响.
开发和验证一种新型的交叉验证方法,对分布偏差具有稳定性.

主要方法:

理论证明,确定训练折叠平均值和测试实例标签之间的负相关性.
在各种机器学习任务,模型和评估指标中进行实证验证.
开发和模拟重新平衡的交叉验证技术,以缓解偏差.

主要成果:

分布偏差被证明是聚合的LOOCV固有的工件,对性能产生负面影响.
这种偏见在各种机器学习应用中被观察到,并且可以不公平地惩罚强有力的规范化.
拟议的再平衡交叉验证方法在模拟和基准中表现出更好的准确性和稳定性.

结论:

聚合的"离开一个"交叉验证引入了系统的分布偏差,损害了评估可靠性.
一个新的再平衡的交叉验证策略有效地减轻了分类和回归中的这种偏差.
这种方法为机器学习模型评估提供了更准确,更可靠的方法,特别是在数据稀缺的环境中.