Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Labeling DNA Probes

Labeling DNA Probes

DNA probes are fragments of DNA labeled with a reporter tag to enable their detection or purification. The resulting labeled DNA probes can then hybridize to target nucleic acid sequences through complementary base-pairing, and may be used to recover or identify these regions.
Radioisotopes, fluorophores, or small molecule binding partners like biotin or digoxigenin, are the most widely used reporter tags for labeling DNA probes. These labels can be attached to the probe DNA molecule via...

Labeling Emotion

Labeling Emotion

Emotional labeling is a cognitive process that involves identifying and naming one's emotions, such as anger, fear, happiness, or sadness. It allows individuals to recognize and express their internal emotional states, a critical aspect of emotional regulation and communication. Labeling emotions requires more than mere recognition; it also involves drawing upon memory and contextual cues to understand the current situation and apply a corresponding emotional label. For instance, feeling...

Multiple Comparison Tests

Multiple Comparison Tests

Multiple comparison test, abbreviated as MCT, is a post hoc analysis generally performed after comparing multiple samples with one or more tests. An MCT will help identify a significantly different sample among multiple samples or a factor among multiple factors.
It would be easy to compare two samples using a significance alpha level of 0.05. In other words, there is only one sample pair to be compared. However, it would be difficult to identify a significantly different sample if the number...

Olfaction

Olfaction

The sense of smell is achieved through the activities of the olfactory system. It starts when an airborne odorant enters the nasal cavity and reaches olfactory epithelium (OE). The OE is protected by a thin layer of mucus, which also serves the purpose of dissolving more complex compounds into simpler chemical odorants. The size of the OE and the density of sensory neurons varies among species; in humans, the OE is only about 9-10 cm2.
The olfactory receptors are embedded in the cilia of the...

Aggregates Classification

Aggregates Classification

Aggregate classification is generally based on its size, petrographic characteristics, weight, and source. Size classification ranges from coarse to fine aggregates, defined by the size of the particles. Coarse aggregates are particles that do not pass through ASTM sieve No. 4, and aggregates that pass through the sieve are fine aggregates.
Petrographic classification groups aggregates based on common mineralogical characteristics. Some of the common mineral groups found in aggregates are...

Classification of Systems-II

Classification of Systems-II

Continuous-time systems have continuous input and output signals, with time measured continuously. These systems are generally defined by differential or algebraic equations. For instance, in an RC circuit, the relationship between input and output voltage is expressed through a differential equation derived from Ohm's law and the capacitor relation,

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

DIVE: A Multi-Label Smart Contract Vulnerability Dataset.

Scientific data·2026

Same author

Taxonomy-based approach for understanding and enhancing security culture in universities.

PeerJ. Computer science·2025

Same author

A measurement framework to assess software maturity models.

PeerJ. Computer science·2025

Same author

Dynamic stacking ensemble for cross-language code smell detection.

PeerJ. Computer science·2024

Same author

Eliciting and modeling emotional requirements: a systematic mapping review.

PeerJ. Computer science·2024

Same author

Python code smells detection using conventional machine learning models.

PeerJ. Computer science·2023

Same journal

A Dataset with Bilingual TV Commands for Silent Speech Interfaces Using Electroencephalographic Signals.

Scientific data·2026

Same journal

BEAMSTER: Brain mEtAstases segMentation for STEreotactic Radiotherapy, A Retrospective MRI Dataset with Expert Segmentations.

Scientific data·2026

Same journal

Chromosomal-level genome assembly of Tetraponera attenuata (Hymenoptera: Formicidae).

Scientific data·2026

Same journal

High quality Chromosome-scale Genome Assembly of Phlebotomus perniciosus, a Vector of Zoonotic Leishmaniasis.

Scientific data·2026

Same journal

Characterisation Data of common pharmaceutical excipient Powders and Tablets for Formulation Development.

Scientific data·2026

Same journal

Chinese Electric Vehicle Policy Database: A Dataset of Policy Goals, Instruments, and Supply Chain Stages.

Scientific data·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 15, 2025

Sampling and Analysis of Animal Scent Signals

Sampling and Analysis of Animal Scent Signals

Published on: February 13, 2021

SmellyCode++: Multi-Label Dataset for Code Smell Detection.

Nawaf Alomari¹, Amal Alazba², Hamoud Aljamaan^3,4

¹Information and Computer Science Department, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia. g201931050@kfupm.edu.sa.

Scientific Data

|July 12, 2025

Summary

This summary is machine-generated.

This study introduces a new multi-label dataset for code smell detection, improving realism for software quality analysis. The dataset supports advanced detection methods, enhancing maintainability and refactoring efforts.

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Automatic Image Processing to Determine the Community Size Structure of Riverine Macroinvertebrates

Automatic Image Processing to Determine the Community Size Structure of Riverine Macroinvertebrates

Published on: January 13, 2023

Related Experiment Videos

Last Updated: Sep 15, 2025

Sampling and Analysis of Animal Scent Signals

Sampling and Analysis of Animal Scent Signals

Published on: February 13, 2021

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Automatic Image Processing to Determine the Community Size Structure of Riverine Macroinvertebrates

Automatic Image Processing to Determine the Community Size Structure of Riverine Macroinvertebrates

Published on: January 13, 2023

Area of Science:

Software Engineering
Computer Science
Data Science

Background:

Code smells signify poor software design, impacting maintainability and necessitating accurate detection for effective refactoring.
Current datasets often use single-label classification, which does not reflect the complex, multi-faceted nature of code smell occurrences in real-world projects.

Purpose of the Study:

To develop a novel multi-label dataset for code smell detection.
To integrate textual and numerical features from open-source Java projects for a more realistic representation.
To facilitate advanced research and improve the accuracy of code smell detection tools.

Main Methods:

Collected code from 103 open-source Java projects.
Parsed code into Abstract Syntax Trees (ASTs) and extracted relevant features.
Annotated samples for four specific code smells (God Class, Data Class, Feature Envy, Long Method) using data cleaning and unification techniques.

Main Results:

Created a dataset with 107,554 samples featuring multi-label annotations, enhancing detection realism.
Achieved high F1 scores: 95.89% for Data Class, 94.48% for God Class, 88.68% for Feature Envy, and 88.87% for Long Method.
The dataset provides a robust foundation for evaluating and improving code smell detection algorithms.

Conclusions:

The developed dataset is valuable for advanced code smell detection studies, including fine-tuning Large Language Models (LLMs).
Future work can extend the dataset to include other programming languages and additional code smells, increasing its applicability and diversity.
This resource will contribute to better software quality and maintainability through more accurate code smell identification.