Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Mean Absolute Deviation

Mean Absolute Deviation

The mean absolute deviation is also a measure of the variability of data in a sample. It is the absolute value of the average difference between the data values and the mean.
Let us consider a dataset containing the number of unsold cupcakes in five shops: 10, 15, 8, 7, and 10. Initially, calculate the sample mean. Then calculate the deviation, or the difference, between each data value and the mean. Next, the absolute values of these deviations are added and divided by the sample size to...

Logarithmic Differentiation

Logarithmic Differentiation

When a car’s weight and driving forces act on a tire, they impose an external load on the rubber material. This load is resisted internally by forces distributed throughout the tire structure, which are defined as stress. The resulting deformation of the rubber due to this stress is quantified as strain. The relationship between stress and strain governs how the tire deforms under load and is central to understanding its mechanical response during operation.Rubber exhibits a nonlinear...

Second Derivatives and Laplace Operator

Second Derivatives and Laplace Operator

The first order operators using the del operator include the gradient, divergence and curl. Certain combinations of first order operators on a scalar or vector function yield second order expressions. Second-order expressions play a very important role in mathematics and physics. Some second order expressions include the divergence and curl of a gradient function, the divergence and curl of a curl function, and the gradient of a divergence function.
Consider a scalar function. The curl of its...

Divergence and Curl

Divergence and Curl

The divergence of a vector field at a point is the net outward flow of the flux out of a small volume through a closed surface enclosing the volume, as the volume tends to zero. More practically, divergence measures how much a vector field spreads out or diverges from a given point. For an outgoing flux, conventionally, the divergence is positive. The diverging point is often called the "source" of the field. Meanwhile, the negative divergence of a vector field at a point means that the vector...

Divergence and Stokes' Theorems

Divergence and Stokes' Theorems

The divergence and Stokes' theorems are a variation of Green's theorem in a higher dimension. They are also a generalization of the fundamental theorem of calculus. The divergence theorem and Stokes' theorem are in a way similar to each other; The divergence theorem relates to the dot product of a vector, while Stokes' theorem relates to the curl of a vector. Many applications in physics and engineering make use of the divergence and Stokes' theorems, enabling us to write numerous physical laws...

Divergence Theorem in 3D Space

Divergence Theorem in 3D Space

In vector calculus, flux measures the total flow of a vector field through a surface. For a closed surface in three-dimensional space, this means measuring how much of the field passes outward through every point on the boundary. Directly calculating this flux can be difficult when the surface has a complicated or irregular shape. The Divergence Theorem provides a powerful alternative by relating surface flux to behavior inside the enclosed region.The Divergence Theorem states that the outward...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Occurrence, Bioaccumulation and Dietary Exposure Assessment of Legacy and Emerging Per- and Polyfluoroalkyl Substances (PFAS) in Freshwater Fish from Zhejiang Markets: Implications for Human Health Risks.

Toxics·2026

Same author

Efficient and accurate neural-field reconstruction using resistive memory.

Nature·2026

Same author

Detail++: Training-Free Detail Enhancer for T2I Diffusion Models.

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2026

Same author

CT-based 3D planning-supported stem-free screw-cement augmentation versus 2D radiograph-planned short-stem reconstruction for AORI type I-II medial tibial defects in TKA: a two-year retrospective study.

Journal of orthopaedic surgery and research·2026

Same author

S $^{2}$ VG: 3D Stereoscopic and Spatial Video Generation via Denoising Frame Matrix.

IEEE transactions on pattern analysis and machine intelligence·2026

Same author

Parallel Diffusion Solver via Residual Dirichlet Policy Optimization.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Raising the Bar in Graph OOD Generalization: Invariant Learning beyond Explicit Environment Modeling.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

LoRASculpt: Harmonious Low-Rank Adaptation for Multimodal Large Language Models.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Linearly Solving Robust Rotation Estimation.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Adapting Dense Vision-Language Relationships for Multi-label Classification with Partial Label.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Forensics Adapter: Unleashing CLIP for Generalizable Face Forgery Detection.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

MoE-Enhanced Explainable Deep Manifold Transformation for Complex Data Embedding and Visualization.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

Search research articles

Home
Generalized Kullback-leibler Divergence Loss.

Home
Generalized Kullback-leibler Divergence Loss.

Related Experiment Videos

Generalized Kullback-Leibler Divergence Loss.

Jiequan Cui, Beier Zhu, Qingshan Xu

IEEE Transactions on Pattern Analysis and Machine Intelligence

|June 15, 2026

View abstract on PubMed

Summary

This summary is machine-generated.

This study proves Kullback-Leibler (KL) Divergence loss is equivalent to Decoupled KL (DKL) loss. Enhancements lead to Generalized KL (GKL) Divergence loss, improving adversarial robustness and knowledge distillation.

Related Experiment Videos

Area of Science:

Machine Learning
Computer Vision
Optimization

Background:

Kullback-Leibler (KL) Divergence loss is a fundamental metric in machine learning.
Existing KL loss formulations present limitations in specific applications like knowledge distillation and adversarial training.
The decoupled structure of Decoupled KL (DKL) Divergence loss offers potential for improvement.

Purpose of the Study:

To mathematically prove the equivalence between KL Divergence loss and Decoupled KL (DKL) Divergence loss.
To enhance KL/DKL loss by addressing optimization challenges and sample bias.
To introduce a novel Generalized KL (GKL) Divergence loss.

Main Methods:

Mathematical proof of KL and DKL loss equivalence.
Modification of KL loss to break asymmetric optimization and incorporate smoother weight functions.

Integration of class-wise global information into KL/DKL loss.

Empirical evaluation on CIFAR-10/100, ImageNet, and vision-language datasets.

Main Results:

Demonstrated equivalence between KL Divergence loss and DKL loss (weighted Mean Square Error + Cross-Entropy with soft labels).
Achieved state-of-the-art adversarial robustness on the RobustBench leaderboard.
Obtained competitive knowledge distillation performance on various models and datasets.
The proposed Generalized KL (GKL) Divergence loss shows significant practical merits.

Conclusions:

The Generalized KL (GKL) Divergence loss offers substantial improvements over standard KL and DKL losses.
GKL loss effectively enhances adversarial robustness and knowledge distillation tasks.
The findings provide a more robust and versatile loss function for deep learning applications.