Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Residuals and Least-Squares Property

Residuals and Least-Squares Property

The vertical distance between the actual value of y and the estimated value of y. In other words, it measures the vertical distance between the actual data point and the predicted point on the line
If the observed data point lies above the line, the residual is positive, and the line underestimates the actual data value for y. If the observed data point lies below the line, the residual is negative, and the line overestimates the actual data value for y.
The process of fitting the best-fit...

Multi-input and Multi-variable systems

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence...

Associative Learning

Associative Learning

Associative learning is a fundamental concept in behavioral psychology, wherein a connection is established between two stimuli or events, leading to a learned response. This process is critical in understanding how behaviors are acquired and modified. Conditioning, the mechanism through which associations are formed, can be divided into two main types: classical conditioning and operant conditioning, each elucidating different aspects of associative learning.
Classical conditioning, also known...

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...

Improving Translational Accuracy

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...

Regression Toward the Mean

Regression Toward the Mean

Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Development and Validation of a Nomogram to Predict Liver Metastases in Patients With Gastric Cancer.

Indian journal of surgical oncology·2026

Same author

One-Shot Pd(II)-Catalyzed Multiple C-H Activation Enables Modular Construction of Fluorenylidene Oxindole-Based Multi(Polycyclic) Aromatic Enes.

Chemistry (Weinheim an der Bergstrasse, Germany)·2026

Same author

Assessing the reliability and quality of avascular necrosis of the femoral head content on social media: a cross-sectional content analysis.

Scientific reports·2026

Same author

Development of a machine learning-based mortality prediction model for patients with mental disorders and COVID-19.

Frontiers in cellular and infection microbiology·2026

Same author

Targeted affinity fishing of components from the n-butanol extract of Gualou-Xiebai-Banxia decoction for the FGF21/FGFR1/βKlotho-FRS2α pathway and verification of their activities.

Journal of chromatography. B, Analytical technologies in the biomedical and life sciences·2026

Same author

Evolutionary insight and characterization of WOX genes in callus development and differentiation of Peucedanum praeruptorum.

Planta·2026

Same journal

Exploiting audio-visual modalities in videos: Object detection via multi-stage bilateral coupling network.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Reliability-aware modality completion with cross-modal distillation for federated learning with missing modalities.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

IGFD-Net: Illumination-guided frequency decoupling for polarization image fusion.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Multiple-Strategies dung beetle optimizer and its applications in engineering optimization and bankruptcy prediction.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Aggregating global-scale pixel-wise forgery cues within a graph.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Finite-Time intermittent control for secure synchronization of Neutral-Type stochastic delayed neural networks under aperiodic DoS attacks.

Neural networks : the official journal of the International Neural Network Society·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 9, 2025

Deep Neural Networks for Image-Based Dietary Assessment

Deep Neural Networks for Image-Based Dietary Assessment

Published on: March 13, 2021

Rethinking softmax in incremental learning.

Zheng Zhai¹, Jiali Zhang², Haiyu Wang³

¹Department of Statistics, Faculty of Arts and Sciences, Beijing Normal University, Zhuhai, Guangdong, China.

Neural Networks : the Official Journal of the International Neural Network Society

|September 1, 2025

Summary

This summary is machine-generated.

This study addresses catastrophic forgetting in incremental learning by introducing new distillation losses. Our methods improve accuracy and reduce forgetting in machine learning models.

Keywords:

Catastrophic forgetting Continual learning Distillation loss Incremental learning Life-long learning

More Related Videos

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Related Experiment Videos

Last Updated: Sep 9, 2025

Deep Neural Networks for Image-Based Dietary Assessment

Deep Neural Networks for Image-Based Dietary Assessment

Published on: March 13, 2021

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Area of Science:

Machine Learning
Artificial Intelligence
Deep Learning

Background:

Catastrophic forgetting is a major hurdle in incremental learning, where models forget previously learned information when trained on new data.
The standard softmax cross-entropy distillation loss suffers from non-identifiability, hindering effective incremental learning.

Purpose of the Study:

To propose novel strategies to mitigate catastrophic forgetting in incremental learning.
To address the non-identifiability issue in softmax cross-entropy distillation loss.

Main Methods:

Introduced an imbalance-invariant distillation loss to counteract imbalanced weights during distillation.
Regularized prediction/distillation loss with shift-sensitive alternatives for problem identifiability.
Developed five novel approaches integrating into existing frameworks like LWF, LWM, and LUCIR.

Main Results:

Demonstrated consistent improvements in predictive accuracy across multiple incremental learning frameworks.
Achieved substantial reductions in forgetting rates in extensive numerical experiments.
On CIFAR-100, improved average accuracy by over 11% and reduced forgetting by over 16% for LWF, LWM, and LUCIR.

Conclusions:

The proposed strategies effectively mitigate catastrophic forgetting in incremental learning.
The novel approaches enhance the performance of distillation-based incremental learning methods.
The research offers practical solutions for building more robust incremental learning systems.