Search research articles

Related Concept Videos

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Types of Errors: Detection and Minimization

Types of Errors: Detection and Minimization

Error is the deviation of the obtained result from the true, expected value or the estimated central value. Errors are expressed in absolute or relative terms.
Absolute error in a measurement is the numerical difference from the true or central value. Relative error is the ratio between absolute error and the true or central value, expressed as a percentage.
Errors can be classified by source, magnitude, and sign. There are three types of errors: systematic, random, and gross.
Systematic or...

Detection of Gross Error: The Q Test

Detection of Gross Error: The Q Test

When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...

Classification of Signals

Classification of Signals

In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...

Goodness-of-Fit Test

Goodness-of-Fit Test

The goodness-of-fit test is a type of hypothesis test which determines whether the data "fits" a particular distribution. For example, one may suspect that some anonymous data may fit a binomial distribution. A chi-square test (meaning the distribution for the hypothesis test is chi-square) can be used to determine if there is a fit. The null and alternative hypotheses may be written in sentences or stated as equations or inequalities. The test statistic for a goodness-of-fit test is given as...

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

DIVE: A Multi-Label Smart Contract Vulnerability Dataset.

Scientific data·2026

Same author

SmellyCode++: Multi-Label Dataset for Code Smell Detection.

Scientific data·2025

Same author

Dynamic stacking ensemble for cross-language code smell detection.

PeerJ. Computer science·2024

Same journal

DARUMA: a gateway to fast and easy prediction of intrinsically disordered regions.

PeerJ. Computer science·2026

Same journal

Alzheimer's disease detection using a quantum deep neural network with Haralick feature extraction and simulated annealing optimization.

PeerJ. Computer science·2026

Same journal

Network anomaly detection using Deep Autoencoder and parallel Artificial Bee Colony algorithm-trained neural network.

PeerJ. Computer science·2026

Same journal

An anomaly detection model for multivariate time series with anomaly perception.

PeerJ. Computer science·2026

Same journal

Retraction: A wormhole attack detection method for tactical wireless sensor networks.

PeerJ. Computer science·2026

Same journal

Evaluation of mental disorder with prioritization of its type by utilizing the bipolar complex fuzzy decision-making approach based on Schweizer-Sklar prioritized aggregation operators.

PeerJ. Computer science·2026

See all related articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Video

Updated: Jul 26, 2025

Design and Analysis for Fall Detection System Simplification

Design and Analysis for Fall Detection System Simplification

Published on: April 6, 2020

Python code smells detection using conventional machine learning models.

Rana Sandouka¹, Hamoud Aljamaan¹

¹Information and Computer Science Department, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia.

Peerj. Computer Science

|June 22, 2023

Summary

This summary is machine-generated.

This study introduces a new Python dataset for detecting Large Class and Long Method code smells. Machine learning models show varying performance, with Random Forest excelling at Large Class detection and Decision Tree at Long Method detection.

Keywords:

Code smell Detection Large class Long method Machine learning Python

More Related Videos

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

Author Spotlight: AI-Driven Trypanosome Species Detection from Microscopic Images

Author Spotlight: AI-Driven Trypanosome Species Detection from Microscopic Images

Published on: October 27, 2023

Related Experiment Videos

Last Updated: Jul 26, 2025

Design and Analysis for Fall Detection System Simplification

Design and Analysis for Fall Detection System Simplification

Published on: April 6, 2020

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

Author Spotlight: AI-Driven Trypanosome Species Detection from Microscopic Images

Author Spotlight: AI-Driven Trypanosome Species Detection from Microscopic Images

Published on: October 27, 2023

Area of Science:

Software Engineering
Machine Learning
Data Science

Background:

Code smells degrade software quality and complicate maintenance.
Existing research on code smell detection primarily uses Java datasets.
A gap exists in dedicated Python code smell datasets for machine learning.

Purpose of the Study:

To propose and introduce a novel Python code smell dataset.
To evaluate the performance of baseline machine learning models for detecting Large Class and Long Method code smells in Python.
To establish benchmarks for future Python code smell detection research.

Main Methods:

Development of a Python code smell dataset with 1,000 samples each for Large Class and Long Method smells, featuring 18 extracted source code features.
Investigation of six machine learning models as baselines for code smell detection.
Evaluation of model performance using Accuracy and Matthews Correlation Coefficient (MCC).

Main Results:

The Random Forest model achieved the highest MCC of 0.77 for Large Class code smell detection.
The Decision Tree model demonstrated the best performance for Long Method code smell detection, with an MCC of 0.89.
Performance varied across models and code smell types, highlighting the need for tailored approaches.

Conclusions:

The developed Python dataset facilitates research into code smell detection for this widely used language.
Specific machine learning models show promise for detecting different types of Python code smells.
Further research can build upon these findings to improve automated code quality assessment in Python projects.