Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Sampling Distribution01:12

Sampling Distribution

13.6K
Given simple random samples of size n from a given population with a measured characteristic such as mean, proportion, or standard deviation for each sample, the probability distribution of all the measured characteristics is called a sampling distribution. How much the statistic varies from one sample to another is known as the sampling variability of a statistic. You typically measure the sampling variability of a statistic by its standard error. The standard error of the mean is an example...
13.6K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

1.8K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
1.8K
Random Sampling Method01:09

Random Sampling Method

11.9K
Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest. Among the various sampling methods used by...
11.9K
Bootstrapping01:24

Bootstrapping

658
The term "bootstrap" originated in the 19th century as a metaphor for self-improvement or achieving something independently, without external assistance. This concept extends to statistical bootstrapping, a self-contained method for estimating population parameters through resampling, even though it can be computationally intensive. Developed by the American statistician Dr. Bradley Efron in 1979, bootstrapping provides a robust way to perform inference when the original sample size is...
658
Sampling Methods: Overview01:06

Sampling Methods: Overview

448
A sample refers to a smaller subset representative of a larger population. In analytical chemistry, studying or analyzing an entire population is often impractical or impossible. Therefore, samples are used to draw inferences and generalize the whole population. The sampling method selects individuals or items from a population to create a sample. Standard sampling methods include random, judgemental, systematic, stratified, and cluster sampling. 
In analytical chemistry, the choice of...
448
Upsampling01:22

Upsampling

288
Managing signal sampling rates is essential in digital signal processing to maintain signal integrity. A decimated signal, characterized by a reduced frequency range due to its lower sampling rate, can be upsampled by inserting zeros between each sample. This upsampling process expands the original spectrum and introduces repeated spectral replicas at intervals dictated by the new Nyquist frequency. To refine this zero-inserted sequence, it is passed through a lowpass filter with a cutoff...
288

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Biological significance, molecular mechanisms and clinical potential of EI24 in cancer.

Chinese medical journal·2025
Same author

Construction of a Heterotrophic Nitrification-Aerobic Denitrification Composite Microbial Consortium and Its Bioaugmentation Role in Wastewater Treatment.

Biology·2025
Same author

A distributed alternating optimization approach to canonical correlation analysis based fault detection for dynamic systems.

ISA transactions·2025
Same author

TPI1 enhances gemcitabine resistance in bladder cancer by promoting autophagy through activating Beclin-1.

Cell death & disease·2025
Same author

Targeting PSMB5-induced PANoptosis in bladder cancer: multi-omics insights and TCM candidate discovery.

Frontiers in immunology·2025
Same author

Dihydro-R demonstrates innate immunity against Adenovirus-7 by suppressing the NF-κB/JAK-STAT pathway in a SIRT1-dependent manner.

Biochemistry and biophysics reports·2025
Same journal

Hidden Data Recovery and Forecasting via Next-Generation Reservoir Computing With Multiscale Delay Selection.

IEEE transactions on neural networks and learning systems·2026
Same journal

CAFF-CIL: Causality-Aware Freedom Forgetting Approach for Class-Incremental Learning.

IEEE transactions on neural networks and learning systems·2026
Same journal

Harmonic Autoencoding Framework for Multiple Tasks in Magnetic Particle Imaging Reconstruction.

IEEE transactions on neural networks and learning systems·2026
Same journal

A Survey on Human-Centric Voice-Face Multimodal Learning.

IEEE transactions on neural networks and learning systems·2026
Same journal

Vision-Assisted Foundation Model for Solving Multitask Vehicle Routing Problems.

IEEE transactions on neural networks and learning systems·2026
Same journal

FP3O: Enabling Proximal Policy Optimization in Multiagent Cooperation With Parameter-Sharing Versatility.

IEEE transactions on neural networks and learning systems·2026
See all related articles

Related Experiment Video

Updated: Aug 20, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

665

ReSmooth: Detecting and Utilizing OOD Samples When Training With Data Augmentation.

Chenyang Wang, Junjun Jiang, Xiong Zhou

    IEEE Transactions on Neural Networks and Learning Systems
    |November 23, 2022
    PubMed
    Summary
    This summary is machine-generated.

    This article introduces ReSmooth, a new computational framework designed to improve deep learning by identifying and managing low-quality, out-of-distribution training data created during image augmentation processes. By separating reliable data from noisy samples, the system optimizes how models learn from diverse inputs.

    Keywords:
    neural networksimage classificationsynthetic dataloss distribution

    Frequently Asked Questions

    More Related Videos

    Deep Neural Networks for Image-Based Dietary Assessment
    13:19

    Deep Neural Networks for Image-Based Dietary Assessment

    Published on: March 13, 2021

    9.3K
    DNA Virus Detection System Based on RPA-CRISPR/Cas12a-SPM and Deep Learning
    04:17

    DNA Virus Detection System Based on RPA-CRISPR/Cas12a-SPM and Deep Learning

    Published on: May 10, 2024

    843

    Related Experiment Videos

    Last Updated: Aug 20, 2025

    Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
    03:14

    Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

    Published on: December 6, 2024

    665
    Deep Neural Networks for Image-Based Dietary Assessment
    13:19

    Deep Neural Networks for Image-Based Dietary Assessment

    Published on: March 13, 2021

    9.3K
    DNA Virus Detection System Based on RPA-CRISPR/Cas12a-SPM and Deep Learning
    04:17

    DNA Virus Detection System Based on RPA-CRISPR/Cas12a-SPM and Deep Learning

    Published on: May 10, 2024

    843

    Area of Science:

    • Computer vision research within ReSmooth machine learning
    • Statistical modeling and data augmentation optimization

    Background:

    Deep learning models often rely on expanded datasets to improve generalization capabilities. No prior work had resolved the negative impact of high-diversity augmentation strategies on model stability. That uncertainty drove researchers to investigate why certain synthetic inputs degrade overall predictive accuracy. It was already known that aggressive data modification techniques frequently generate samples that deviate from the original training distribution. This gap motivated the development of methods to distinguish between helpful and harmful synthetic data points. Prior research has shown that standard training protocols treat all augmented inputs as equally valid. Such assumptions often lead to performance bottlenecks when synthetic data quality varies significantly. The field currently lacks robust mechanisms to filter these problematic inputs during the learning phase.

    Purpose Of The Study:

    The aim of this study is to introduce a framework that detects and utilizes out-of-distribution samples during data augmentation. This research addresses the problem where high-diversity augmentation strategies introduce noisy samples that impair model performance. The authors seek to optimize the training process by distinguishing between reliable and problematic synthetic data. They propose a method to categorize inputs into in-distribution and out-of-distribution sets. The motivation stems from the need to improve how deep neural networks learn from diverse augmented datasets. By treating these two types of data differently, the researchers intend to maximize the benefits of augmentation. The study focuses on creating a flexible system that works with existing augmentation techniques. This work explores whether unequal treatment of training samples can lead to superior classification outcomes.

    Main Methods:

    The authors implement a Gaussian mixture model to analyze the loss profiles of training inputs. This review approach involves fitting these profiles to distinguish between standard and synthetic data points. The team conducts experiments across multiple classification benchmarks to validate the framework. They integrate their method with established techniques such as RandAugment, rotate, and jigsaw. The design treats in-distribution and out-of-distribution samples with unique smooth labels during a subsequent training cycle. This procedure ensures that the model learns differently from diverse data qualities. The researchers evaluate the efficacy of their approach by comparing it against baseline augmentation strategies. The entire pipeline is designed for compatibility with existing neural network architectures.

    Main Results:

    Key findings from the literature demonstrate that the framework consistently improves classification performance across various benchmarks. The authors report that their method successfully identifies and separates out-of-distribution samples from standard training data. By applying different smooth labels, the model achieves better utilization of diverse synthetic inputs. The study shows that this approach ameliorates the performance of negative data augmentation strategies. Experimental results confirm that the framework integrates effectively with existing tools like RandAugment. The researchers observe that treating samples unequally leads to more stable training outcomes. The data indicates that the Gaussian mixture model accurately partitions inputs based on their loss distribution. These results suggest that managing synthetic data quality is a robust strategy for enhancing deep neural networks.

    Conclusions:

    The authors propose that their framework effectively mitigates the performance degradation caused by noisy synthetic data. Synthesis and implications suggest that treating samples differently based on their distribution status improves model robustness. The researchers demonstrate that their approach integrates seamlessly with existing augmentation pipelines like RandAugment. Findings indicate that classification accuracy increases when models are trained with tailored labels for distinct data types. The study highlights that intentionally created out-of-distribution samples can be harnessed for better performance. Authors suggest that their method provides a flexible solution for various image classification benchmarks. The evidence indicates that the proposed Gaussian mixture model approach successfully separates training inputs into distinct categories. This work confirms that managing synthetic data quality is a viable path for enhancing neural network training.

    The researchers propose a Gaussian mixture model to analyze loss distributions. By fitting these distributions, the system identifies out-of-distribution samples, which are then assigned different smooth labels compared to in-distribution data to improve overall classification performance.

    The framework utilizes a Gaussian mixture model to categorize training inputs. This statistical tool allows the system to partition data into in-distribution and out-of-distribution sets based on their respective loss values during the initial training phase.

    A separate training phase is necessary to apply distinct smooth labels to the identified data groups. This step ensures that the model treats high-quality and noisy inputs differently, preventing the latter from impairing the final classification accuracy.

    The framework uses loss distribution data to perform its classification task. This specific data type allows the system to mathematically distinguish between reliable augmented samples and those that deviate from the expected distribution.

    The authors measure classification performance across several benchmarks. They compare their method against standard augmentation strategies like RandAugment, rotate, and jigsaw, showing that their approach consistently improves results across these different techniques.

    The researchers propose that their method can be easily extended to existing augmentation strategies. By properly handling intentionally created out-of-distribution samples, the classification performance of negative data augmentation is largely ameliorated according to the authors.