Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Gradient and Del Operator

Gradient and Del Operator

In mathematics and physics, the gradient and del operator are fundamental concepts used to describe the behavior of functions and fields in space. The gradient is a mathematical operator that gives both the magnitude and direction of the maximum spatial rate of change. Consider a person standing on a mountain. The slope of the mountain at any given point is not defined unless it is quantified in a particular direction. For this reason, a "directional derivative" is defined, which is a vector...

Scaling

Scaling

In designing and analyzing filters, resonant circuits, or circuit analysis at large, working with standard element values like 1 ohm, 1 henry, or 1 farad can be convenient before scaling these values to more realistic figures. This approach is widely utilized by not employing realistic element values in numerous examples and problems; it simplifies mastering circuit analysis through convenient component values. The complexity of calculations is thereby reduced, with the understanding that...

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Self-Made Internal Traction Method For Endoscopic Submucosal Dissection of Early Gastric Cancer.

Journal of visualized experiments : JoVE·2026

Same author

Mineral-facilitated aqueous synthesis of hydrogen cyanide from prebiotically abundant amino acids for chemical evolution.

Proceedings of the National Academy of Sciences of the United States of America·2026

Same author

The illusion of a "sense of body lightness" while walking: a preliminary exploratory study.

Frontiers in psychology·2026

Same author

Outcome and Prognostic Factors of Colorectal Endoscopic Submucosal Dissection in Patients Aged Over 75 Years.

JGH open : an open access journal of gastroenterology and hepatology·2025

Same author

Molecular insights into the dynamic relationship between respiration rate and sulfur isotope effect.

Applied and environmental microbiology·2025

Same author

Metabolic Potential and Microbial Diversity of Late Archean to Early Proterozoic Ocean Analog Hot Springs of Japan.

Microbes and environments·2025

Same journal

Benchmarking the Robustness of Autonomous Driving to Environmental Illusions: A Lane Perception Perspective.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Learning Topology-Aware Representations via Test-Time Adaptation for Anomaly Segmentation.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

TraGraph-GS: Trajectory Graph-based Gaussian Splatting for Arbitrary Large-Scale Scene Rendering.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

SWIFT: A Small-World Interaction Framework for Flow-Aware Trajectory Prediction in Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

HardFlow: Hard-Constrained Sampling for Flow-Matching Models Via Trajectory Optimization.

IEEE transactions on pattern analysis and machine intelligence·2026

Same journal

Industrial Brain: Self-Evolving Neuro-Symbolic Autonomy with Causal Resilience for Cyber-Physical Systems.

IEEE transactions on pattern analysis and machine intelligence·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Dec 13, 2025

Deep Neural Networks for Image-Based Dietary Assessment

Deep Neural Networks for Image-Based Dietary Assessment

Published on: March 13, 2021

Scalable and Practical Natural Gradient for Large-Scale Deep Learning.

Kazuki Osawa, Yohei Tsuji, Yuichiro Ueno

IEEE Transactions on Pattern Analysis and Machine Intelligence

|August 6, 2020

Summary

This summary is machine-generated.

Scalable and Practical Natural Gradient Descent (SP-NGD) improves deep learning model generalization during large-batch distributed training. This method accelerates convergence and maintains performance, offering a practical solution for large-scale neural network training.

More Related Videos

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

Related Experiment Videos

Last Updated: Dec 13, 2025

Deep Neural Networks for Image-Based Dietary Assessment

Deep Neural Networks for Image-Based Dietary Assessment

Published on: March 13, 2021

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

Published on: December 15, 2023

Area of Science:

Deep Learning
Machine Learning Optimization
Computer Vision

Background:

Large-scale distributed training of deep neural networks often leads to decreased generalization performance due to increased effective mini-batch sizes.
Existing methods to mitigate this issue involve complex adjustments to learning rates, batch sizes, and batch normalization techniques.

Purpose of the Study:

To introduce a scalable and practical natural gradient descent (SP-NGD) method for training deep neural networks.
To enable models to achieve generalization performance comparable to first-order optimization methods while accelerating convergence.
To demonstrate the scalability of SP-NGD with large mini-batch sizes and minimal computational overhead.

Main Methods:

Implementation of Scalable and Practical Natural Gradient Descent (SP-NGD).
Evaluation on a ResNet-50 image classification task on the ImageNet dataset.
Comparison with highly optimized first-order optimization methods.

Main Results:

SP-NGD achieved a top-1 validation accuracy of 75.4% in 5.5 minutes with a mini-batch size of 32,768 using 1,024 GPUs.
Demonstrated convergence to 74.9% accuracy with an extremely large mini-batch size of 131,072 in just 873 SP-NGD steps.
SP-NGD exhibited negligible computational overhead compared to first-order methods.

Conclusions:

SP-NGD is a principled and effective approach for large-scale distributed deep learning.
The method successfully addresses the generalization gap associated with large mini-batch sizes.
SP-NGD offers accelerated convergence and scalability, making it a practical choice for training deep neural networks efficiently.