Jove
Visualize
Contact Us

Related Concept Videos

Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

1.7K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
1.7K
Types of Errors: Detection and Minimization01:12

Types of Errors: Detection and Minimization

1.7K
Error is the deviation of the obtained result from the true, expected value or the estimated central value. Errors are expressed in absolute or relative terms.
Absolute error in a measurement is the numerical difference from the true or central value. Relative error is the ratio between absolute error and the true or central value, expressed as a percentage.
Errors can be classified by source, magnitude, and sign. There are three types of errors: systematic, random, and gross.
Systematic or...
1.7K
Detection of Gross Error: The Q Test01:00

Detection of Gross Error: The Q Test

6.3K
When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...
6.3K
Classification of Signals01:30

Classification of Signals

556
In signal processing, signals are classified based on various characteristics: continuous-time versus discrete-time, periodic versus aperiodic, analog versus digital, and causal versus noncausal. Each category highlights distinct properties crucial for understanding and manipulating signals.
A continuous-time signal holds a value at every instant in time, representing information seamlessly. In contrast, a discrete-time signal holds values only at specific moments, often denoted as x(n), where...
556
Goodness-of-Fit Test01:16

Goodness-of-Fit Test

3.5K
The goodness-of-fit test is a type of hypothesis test which determines whether the data "fits" a particular distribution. For example, one may suspect that some anonymous data may fit a binomial distribution. A chi-square test (meaning the distribution for the hypothesis test is chi-square) can be used to determine if there is a fit. The null and alternative hypotheses may be written in sentences or stated as equations or inequalities. The test statistic for a goodness-of-fit test is given as...
3.5K
Survival Tree01:19

Survival Tree

117
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
117

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

DIVE: A Multi-Label Smart Contract Vulnerability Dataset.

Scientific data·2026
Same author

SmellyCode++: Multi-Label Dataset for Code Smell Detection.

Scientific data·2025
Same author

Dynamic stacking ensemble for cross-language code smell detection.

PeerJ. Computer science·2024
Same journal

DARUMA: a gateway to fast and easy prediction of intrinsically disordered regions.

PeerJ. Computer science·2026
Same journal

Alzheimer's disease detection using a quantum deep neural network with Haralick feature extraction and simulated annealing optimization.

PeerJ. Computer science·2026
Same journal

Network anomaly detection using Deep Autoencoder and parallel Artificial Bee Colony algorithm-trained neural network.

PeerJ. Computer science·2026
Same journal

An anomaly detection model for multivariate time series with anomaly perception.

PeerJ. Computer science·2026
Same journal

Retraction: A wormhole attack detection method for tactical wireless sensor networks.

PeerJ. Computer science·2026
Same journal

Evaluation of mental disorder with prioritization of its type by utilizing the bipolar complex fuzzy decision-making approach based on Schweizer-Sklar prioritized aggregation operators.

PeerJ. Computer science·2026
See all related articles
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Video

Updated: Jul 26, 2025

Design and Analysis for Fall Detection System Simplification
08:05

Design and Analysis for Fall Detection System Simplification

Published on: April 6, 2020

10.7K

Python code smells detection using conventional machine learning models.

Rana Sandouka1, Hamoud Aljamaan1

  • 1Information and Computer Science Department, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia.

Peerj. Computer Science
|June 22, 2023
PubMed
Summary
This summary is machine-generated.

This study introduces a new Python dataset for detecting Large Class and Long Method code smells. Machine learning models show varying performance, with Random Forest excelling at Large Class detection and Decision Tree at Long Method detection.

Keywords:
Code smellDetectionLarge classLong methodMachine learningPython

More Related Videos

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model
07:15

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

6.9K
Author Spotlight: AI-Driven Trypanosome Species Detection from Microscopic Images
08:20

Author Spotlight: AI-Driven Trypanosome Species Detection from Microscopic Images

Published on: October 27, 2023

1.5K

Related Experiment Videos

Last Updated: Jul 26, 2025

Design and Analysis for Fall Detection System Simplification
08:05

Design and Analysis for Fall Detection System Simplification

Published on: April 6, 2020

10.7K
Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model
07:15

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

6.9K
Author Spotlight: AI-Driven Trypanosome Species Detection from Microscopic Images
08:20

Author Spotlight: AI-Driven Trypanosome Species Detection from Microscopic Images

Published on: October 27, 2023

1.5K

Area of Science:

  • Software Engineering
  • Machine Learning
  • Data Science

Background:

  • Code smells degrade software quality and complicate maintenance.
  • Existing research on code smell detection primarily uses Java datasets.
  • A gap exists in dedicated Python code smell datasets for machine learning.

Purpose of the Study:

  • To propose and introduce a novel Python code smell dataset.
  • To evaluate the performance of baseline machine learning models for detecting Large Class and Long Method code smells in Python.
  • To establish benchmarks for future Python code smell detection research.

Main Methods:

  • Development of a Python code smell dataset with 1,000 samples each for Large Class and Long Method smells, featuring 18 extracted source code features.
  • Investigation of six machine learning models as baselines for code smell detection.
  • Evaluation of model performance using Accuracy and Matthews Correlation Coefficient (MCC).

Main Results:

  • The Random Forest model achieved the highest MCC of 0.77 for Large Class code smell detection.
  • The Decision Tree model demonstrated the best performance for Long Method code smell detection, with an MCC of 0.89.
  • Performance varied across models and code smell types, highlighting the need for tailored approaches.

Conclusions:

  • The developed Python dataset facilitates research into code smell detection for this widely used language.
  • Specific machine learning models show promise for detecting different types of Python code smells.
  • Further research can build upon these findings to improve automated code quality assessment in Python projects.