Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Survival Tree01:19

Survival Tree

61
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
61
Classification of Systems-II01:31

Classification of Systems-II

136
Continuous-time systems have continuous input and output signals, with time measured continuously. These systems are generally defined by differential or algebraic equations. For instance, in an RC circuit, the relationship between input and output voltage is expressed through a differential equation derived from Ohm's law and the capacitor relation,
136
Classification of Systems-I01:26

Classification of Systems-I

175
Linearity is a system property characterized by a direct input-output relationship, combining homogeneity and additivity.
Homogeneity dictates that if an input x(t) is multiplied by a constant c, the output y(t) is multiplied by the same constant. Mathematically, this is expressed as:
175
How Data are Classified: Numerical Data00:59

How Data are Classified: Numerical Data

27.7K
Data that are countable or measurable in specific units are called numerical or quantitative data. Quantitative data are always numbers. Quantitative data are the result of counting or measuring the attributes of a population. Amount of money, pulse rate, weight, number of people living in a town, and number of students who opt for statistics are examples of quantitative data.
Quantitative data may be either discrete or continuous. All quantitative data that take on only specific numerical...
27.7K
Aggregates Classification01:29

Aggregates Classification

305
Aggregate classification is generally based on its size, petrographic characteristics, weight, and source. Size classification ranges from coarse to fine aggregates, defined by the size of the particles. Coarse aggregates are particles that do not pass through ASTM sieve No. 4, and aggregates that pass through the sieve are fine aggregates.
Petrographic classification groups aggregates based on common mineralogical characteristics. Some of the common mineral groups found in aggregates are...
305
Classification of Signals01:30

Classification of Signals

410
In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...
410

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Human Factors and Data Logging Processes With the Use of Advanced Technology for Adults With Type 1 Diabetes: Systematic Integrative Review.

JMIR human factors·2018
Same journal

How students use generative AI for software testing: An observational study.

Empirical software engineering·2026
Same journal

Is common sense all you need? Using expert defined rules to identify vulnerability patches instead of machine learning.

Empirical software engineering·2026
Same journal

Less is more: usefulness of data flow diagrams and large language models for security threat validation.

Empirical software engineering·2026
Same journal

SecMLOps: A comprehensive framework for integrating security throughout the machine learning operations lifecycle.

Empirical software engineering·2026
Same journal

Tools and benchmarks evolve: what is their impact on parameter tuning in SBSE experiments?

Empirical software engineering·2025
Same journal

AI support for data scientists: An empirical study on workflow and alternative code recommendations.

Empirical software engineering·2025
See all related articles

Related Experiment Video

Updated: Jun 8, 2025

P300-Based Brain-Computer Interface Speller Performance Estimation with Classifier-Based Latency Estimation
06:09

P300-Based Brain-Computer Interface Speller Performance Estimation with Classifier-Based Latency Estimation

Published on: September 8, 2023

507

The effect of data complexity on classifier performance.

Jonas Eberlein1, Daniel Rodriguez1,2, Rachel Harrison1

  • 1School of Technology, Oxford Brookes University, Headington Campus, Oxford, OX3 0BP UK.

Empirical Software Engineering
|November 4, 2024
PubMed
Summary
This summary is machine-generated.

Software Defect Prediction (SDP) models face performance ceilings. Analyzing data complexity reveals that classifier performance varies by dataset, with some models excelling in specific situations.

Keywords:
ClassificationData complexity metricsSoftware defect prediction

More Related Videos

Cross-Modal Multivariate Pattern Analysis
13:51

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

19.9K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.4K

Related Experiment Videos

Last Updated: Jun 8, 2025

P300-Based Brain-Computer Interface Speller Performance Estimation with Classifier-Based Latency Estimation
06:09

P300-Based Brain-Computer Interface Speller Performance Estimation with Classifier-Based Latency Estimation

Published on: September 8, 2023

507
Cross-Modal Multivariate Pattern Analysis
13:51

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

19.9K
Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.4K

Area of Science:

  • Computer Science
  • Software Engineering

Background:

  • Software Defect Prediction (SDP) is a popular research area, typically framed as a classification problem.
  • Despite advancements in classification, pre-processing, and tuning, SDP models often hit a performance ceiling.
  • This suggests limitations beyond standard model optimization techniques.

Purpose of the Study:

  • To analyze classifier performance in SDP from a data complexity perspective.
  • To investigate the correlation between data complexity metrics and the performance of various machine learning classifiers.
  • To identify specific strengths and weaknesses of different classifiers across diverse datasets.

Main Methods:

  • Calculated data complexity metrics using the Unified Bug Dataset, a compilation of well-known SDP datasets.
  • Evaluated the performance of machine learning classifiers including C5.0, Naive Bayes, Artificial Neural Networks, Random Forests, and Support Vector Machines.
  • Correlated data complexity metrics with classifier performance to understand their relationships.

Main Results:

  • Identified distinct domains of competence and incompetence for different classifiers.
  • Found similarities and differences in classifier performance and their relationship with performance metrics.
  • Demonstrated that data complexity is a critical factor influencing SDP model performance.
  • Observed that certain classifiers perform optimally under specific data complexity conditions.

Conclusions:

  • Classifier performance in Software Defect Prediction is highly dependent on data complexity.
  • No single classifier is universally superior; optimal choice depends on the specific dataset characteristics.
  • Data complexity metrics offer valuable insights into understanding and potentially improving SDP model performance.