Search research articles

关于 JoVE

概览领导团队博客 JoVE 帮助中心

作者

出版流程编辑委员会范围与政策同行评审常见问题投稿

图书馆员

用户评价订阅访问资源图书馆顾问委员会常见问题

研究

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments 存档

教育

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual 教师资源中心教师网站

使用条款与条件

相关概念视频

Nominal Level of Measurement

Nominal Level of Measurement

The way a set of data is measured is called its level of measurement. Correct statistical procedures depend on a researcher being familiar with levels of measurement. Not every statistical operation can be used with every set of data. For analysis, data are classified into four levels of measurement—nominal, ordinal, interval, and ratio.
The data that cannot be measured but can be grouped into categories fall under the nominal level of measurement. Data that is measured using a nominal...

Statistical Analysis: Overview

Statistical Analysis: Overview

When we take repeated measurements on the same or replicated samples, we will observe inconsistencies in the magnitude. These inconsistencies are called errors. To categorize and characterize these results and their errors, the researcher can use statistical analysis to determine the quality of the measurements and/or suitability of the methods.
One of the most commonly used statistical quantifiers is the mean, which is the ratio between the sum of the numerical values of all results and the...

Ordinal Level of Measurement

Ordinal Level of Measurement

The way a set of data is measured is called its level of measurement. Correct statistical procedures depend on a researcher being familiar with levels of measurement. For analysis, data are classified into four levels of measurement—nominal, ordinal, interval, and ratio.
Data measured using an ordinal scale are similar to nominal scale data, but there is one major difference. The ordinal scale data can be ordered. An example of ordinal scale data is a list of the top five national parks...

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a...

Ratio Level of Measurement

Ratio Level of Measurement

The way a set of data is measured is called its level of measurement. Correct statistical procedures depend on a researcher being familiar with levels of measurement. For analysis, data are classified into four levels of measurement—nominal, ordinal, interval, and ratio.
A set of data measured using the ratio scale takes care of the ratio problem and provides complete information. Ratio scale data are like interval scale data, except they have a zero point and ratios can be calculated....

您也可能阅读

相关文章

通过共同作者、期刊和引用图与本文相关的文章。

排序

Same author

Gender-based data bias and model fairness evaluation in benchmarked open-access disease prediction datasets.

Computers in biology and medicine·2026

Same author

The impact of K selection in K‑fold cross-validation on bias and variance in supervised learning models.

Scientific reports·2026

Same author

FG-DDI: Functional group-aware graph neural networks for drug-drug interaction prediction.

Journal of biomedical informatics·2026

Same author

Impact of music-based interventions on subjective well-being: a meta-analysis of listening, training, and therapy in clinical and nonclinical populations.

Frontiers in psychology·2025

Same author

Toward fair medical advice: Addressing and mitigating bias in large language model-based healthcare applications.

Artificial intelligence in medicine·2025

Same author

A Dataset of Stakeholder Networks for Project Performance Analysis.

Scientific data·2025

Same journal

Turbulent flow in a vortex separator with a directed pipe inlet.

Scientific reports·2026

Same journal

Systematic characteristic evaluation of clay-based cementitious material derived from calcium carbide residue and waste tile powder.

Scientific reports·2026

Same journal

Retraction Note: Improvement of a rapid diagnostic application of monoclonal antibodies against avian influenza H7 subtype virus using Europium nanoparticles.

Scientific reports·2026

Same journal

Applying large language models to spam detection in the Kazakh low-resource language setting.

Scientific reports·2026

Same journal

An open-source 3D printing system enabling in-situ freeze-thaw processing of hydrogels.

Scientific reports·2026

Same journal

An enhanced EfficientNet framework for automated waste classification using cosine annealing and label smoothing.

Scientific reports·2026

查看所有相关文章

Search research articles

相关实验视频

Updated: Jul 5, 2025

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

数据集的元级和统计特征影响机器学习性能.

Shahadat Uddin¹, Haohui Lu²

¹School of Project Management, Faculty of Engineering, The University of Sydney, Forest Lodge, NSW, 2037, Australia. shahadat.uddin@sydney.edu.au.

Scientific reports

|January 18, 2024

概括

此摘要是机器生成的。

数据集的特征显著影响机器学习 (ML) 的性能. 库尔托सिस对非树基算法产生负面影响,例如支持向量机 (SVM),后勤回归 (LR) 和K-近邻 (KNN),而元级和统计特征在数据集平衡时会影响准确性.

更多相关视频

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Published on: June 30, 2020

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

相关实验视频

Last Updated: Jul 5, 2025

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Measuring Statistical Learning Across Modalities and Domains in School-Aged Children Via an Online Platform and Neuroimaging Techniques

Published on: June 30, 2020

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

科学领域:

计算机科学计算机科学
机器学习机器学习
数据科学数据科学数据科学

背景情况:

数据集特征对机器学习 (ML) 算法性能的影响在现有文献中仍然基本未被探索.
了解这些关系对于选择最佳的ML模型和提高预测准确度至关重要.

研究的目的:

调查表格式数据集元级和统计特征对各种ML算法的性能的影响.
确定哪些数据集特征在不同算法和实现中显著影响ML模型的准确性.

主要方法:

分析了来自Kaggle和UCI机器学习库的200个开放访问表格数据集.
检查了元级特征 (数据集大小,属性数量,类比) 和统计特征 (平均值,标准偏差,斜率,kurtosis).
开发了ML分类模型 (支持向量机,物流回归,K-最近邻居,决策树,随机森林) 使用经典和超参数调整的实现.
利用多重回归模型来评估数据集特征对ML性能的影响.

主要成果:

库尔托西斯对非树基算法 (SVM,LR,KNN) 在它们的经典实现中的准确性产生了显著的负面影响.
超级级别和统计特征对基于树的算法 (决策树,随机森林) 的影响最小,除了特定的超参数调节场景.
当排除不平衡数据集时,元级比率和统计平均值/标准偏差特征显著影响了SVM,LR和KNN的准确性.

结论:

数据集的特征,特别是kurtosis和类不平衡,在ML算法性能中发挥着关键作用.
研究结果表明,非基于树的算法对数据集的特定统计属性更敏感.
这项研究为了解数据集与算法相互作用开辟了新的途径,有助于选择适当的ML模型以获得最佳结果.