Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a...

Classification of Systems-II

Classification of Systems-II

Continuous-time systems have continuous input and output signals, with time measured continuously. These systems are generally defined by differential or algebraic equations. For instance, in an RC circuit, the relationship between input and output voltage is expressed through a differential equation derived from Ohm's law and the capacitor relation,

Classification of Systems-I

Classification of Systems-I

Linearity is a system property characterized by a direct input-output relationship, combining homogeneity and additivity.
Homogeneity dictates that if an input x(t) is multiplied by a constant c, the output y(t) is multiplied by the same constant. Mathematically, this is expressed as:

How Data are Classified: Numerical Data

How Data are Classified: Numerical Data

Data that are countable or measurable in specific units are called numerical or quantitative data. Quantitative data are always numbers. Quantitative data are the result of counting or measuring the attributes of a population. Amount of money, pulse rate, weight, number of people living in a town, and number of students who opt for statistics are examples of quantitative data.
Quantitative data may be either discrete or continuous. All quantitative data that take on only specific numerical...

Aggregates Classification

Aggregates Classification

Aggregate classification is generally based on its size, petrographic characteristics, weight, and source. Size classification ranges from coarse to fine aggregates, defined by the size of the particles. Coarse aggregates are particles that do not pass through ASTM sieve No. 4, and aggregates that pass through the sieve are fine aggregates.
Petrographic classification groups aggregates based on common mineralogical characteristics. Some of the common mineral groups found in aggregates are...

Classification of Signals

Classification of Signals

In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Human Factors and Data Logging Processes With the Use of Advanced Technology for Adults With Type 1 Diabetes: Systematic Integrative Review.

JMIR human factors·2018

Same journal

How students use generative AI for software testing: An observational study.

Empirical software engineering·2026

Same journal

Is common sense all you need? Using expert defined rules to identify vulnerability patches instead of machine learning.

Empirical software engineering·2026

Same journal

Less is more: usefulness of data flow diagrams and large language models for security threat validation.

Empirical software engineering·2026

Same journal

SecMLOps: A comprehensive framework for integrating security throughout the machine learning operations lifecycle.

Empirical software engineering·2026

Same journal

Tools and benchmarks evolve: what is their impact on parameter tuning in SBSE experiments?

Empirical software engineering·2025

Same journal

AI support for data scientists: An empirical study on workflow and alternative code recommendations.

Empirical software engineering·2025

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 8, 2025

P300-Based Brain-Computer Interface Speller Performance Estimation with Classifier-Based Latency Estimation

P300-Based Brain-Computer Interface Speller Performance Estimation with Classifier-Based Latency Estimation

Published on: September 8, 2023

The effect of data complexity on classifier performance.

Jonas Eberlein¹, Daniel Rodriguez^1,2, Rachel Harrison¹

¹School of Technology, Oxford Brookes University, Headington Campus, Oxford, OX3 0BP UK.

Empirical Software Engineering

|November 4, 2024

Summary

This summary is machine-generated.

Software Defect Prediction (SDP) models face performance ceilings. Analyzing data complexity reveals that classifier performance varies by dataset, with some models excelling in specific situations.

Keywords:

Classification Data complexity metrics Software defect prediction

More Related Videos

Cross-Modal Multivariate Pattern Analysis

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Related Experiment Videos

Last Updated: Jun 8, 2025

P300-Based Brain-Computer Interface Speller Performance Estimation with Classifier-Based Latency Estimation

P300-Based Brain-Computer Interface Speller Performance Estimation with Classifier-Based Latency Estimation

Published on: September 8, 2023

Cross-Modal Multivariate Pattern Analysis

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Area of Science:

Computer Science
Software Engineering

Background:

Software Defect Prediction (SDP) is a popular research area, typically framed as a classification problem.
Despite advancements in classification, pre-processing, and tuning, SDP models often hit a performance ceiling.
This suggests limitations beyond standard model optimization techniques.

Purpose of the Study:

To analyze classifier performance in SDP from a data complexity perspective.
To investigate the correlation between data complexity metrics and the performance of various machine learning classifiers.
To identify specific strengths and weaknesses of different classifiers across diverse datasets.

Main Methods:

Calculated data complexity metrics using the Unified Bug Dataset, a compilation of well-known SDP datasets.
Evaluated the performance of machine learning classifiers including C5.0, Naive Bayes, Artificial Neural Networks, Random Forests, and Support Vector Machines.
Correlated data complexity metrics with classifier performance to understand their relationships.

Main Results:

Identified distinct domains of competence and incompetence for different classifiers.
Found similarities and differences in classifier performance and their relationship with performance metrics.
Demonstrated that data complexity is a critical factor influencing SDP model performance.
Observed that certain classifiers perform optimally under specific data complexity conditions.

Conclusions:

Classifier performance in Software Defect Prediction is highly dependent on data complexity.
No single classifier is universally superior; optimal choice depends on the specific dataset characteristics.
Data complexity metrics offer valuable insights into understanding and potentially improving SDP model performance.