Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Gradient and Del Operator01:14

Gradient and Del Operator

4.1K
In mathematics and physics, the gradient and del operator are fundamental concepts used to describe the behavior of functions and fields in space. The gradient is a mathematical operator that gives both the magnitude and direction of the maximum spatial rate of change. Consider a person standing on a mountain. The slope of the mountain at any given point is not defined unless it is quantified in a particular direction. For this reason, a "directional derivative" is defined, which is a vector...
4.1K
Scaling01:26

Scaling

448
In designing and analyzing filters, resonant circuits, or circuit analysis at large, working with standard element values like 1 ohm, 1 henry, or 1 farad can be convenient before scaling these values to more realistic figures. This approach is widely utilized by not employing realistic element values in numerous examples and problems; it simplifies mastering circuit analysis through convenient component values. The complexity of calculations is thereby reduced, with the understanding that...
448
Survival Tree01:19

Survival Tree

311
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
311

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Self-Made Internal Traction Method For Endoscopic Submucosal Dissection of Early Gastric Cancer.

Journal of visualized experiments : JoVE·2026
Same author

Mineral-facilitated aqueous synthesis of hydrogen cyanide from prebiotically abundant amino acids for chemical evolution.

Proceedings of the National Academy of Sciences of the United States of America·2026
Same author

The illusion of a "sense of body lightness" while walking: a preliminary exploratory study.

Frontiers in psychology·2026
Same author

Outcome and Prognostic Factors of Colorectal Endoscopic Submucosal Dissection in Patients Aged Over 75 Years.

JGH open : an open access journal of gastroenterology and hepatology·2025
Same author

Molecular insights into the dynamic relationship between respiration rate and sulfur isotope effect.

Applied and environmental microbiology·2025
Same author

Metabolic Potential and Microbial Diversity of Late Archean to Early Proterozoic Ocean Analog Hot Springs of Japan.

Microbes and environments·2025
Same journal

Benchmarking the Robustness of Autonomous Driving to Environmental Illusions: A Lane Perception Perspective.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Learning Topology-Aware Representations via Test-Time Adaptation for Anomaly Segmentation.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

TraGraph-GS: Trajectory Graph-based Gaussian Splatting for Arbitrary Large-Scale Scene Rendering.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

SWIFT: A Small-World Interaction Framework for Flow-Aware Trajectory Prediction in Autonomous Driving.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

HardFlow: Hard-Constrained Sampling for Flow-Matching Models Via Trajectory Optimization.

IEEE transactions on pattern analysis and machine intelligence·2026
Same journal

Industrial Brain: Self-Evolving Neuro-Symbolic Autonomy with Causal Resilience for Cyber-Physical Systems.

IEEE transactions on pattern analysis and machine intelligence·2026
See all related articles

Related Experiment Video

Updated: Dec 13, 2025

Deep Neural Networks for Image-Based Dietary Assessment
13:19

Deep Neural Networks for Image-Based Dietary Assessment

Published on: March 13, 2021

9.8K

Scalable and Practical Natural Gradient for Large-Scale Deep Learning.

Kazuki Osawa, Yohei Tsuji, Yuichiro Ueno

    IEEE Transactions on Pattern Analysis and Machine Intelligence
    |August 6, 2020
    PubMed
    Summary
    This summary is machine-generated.

    Scalable and Practical Natural Gradient Descent (SP-NGD) improves deep learning model generalization during large-batch distributed training. This method accelerates convergence and maintains performance, offering a practical solution for large-scale neural network training.

    More Related Videos

    Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications
    03:31

    Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

    Published on: December 15, 2023

    879

    Related Experiment Videos

    Last Updated: Dec 13, 2025

    Deep Neural Networks for Image-Based Dietary Assessment
    13:19

    Deep Neural Networks for Image-Based Dietary Assessment

    Published on: March 13, 2021

    9.8K
    Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications
    03:31

    Author Spotlight: Enhancement of Salient Object Detection for Smart Grid Applications

    Published on: December 15, 2023

    879

    Area of Science:

    • Deep Learning
    • Machine Learning Optimization
    • Computer Vision

    Background:

    • Large-scale distributed training of deep neural networks often leads to decreased generalization performance due to increased effective mini-batch sizes.
    • Existing methods to mitigate this issue involve complex adjustments to learning rates, batch sizes, and batch normalization techniques.

    Purpose of the Study:

    • To introduce a scalable and practical natural gradient descent (SP-NGD) method for training deep neural networks.
    • To enable models to achieve generalization performance comparable to first-order optimization methods while accelerating convergence.
    • To demonstrate the scalability of SP-NGD with large mini-batch sizes and minimal computational overhead.

    Main Methods:

    • Implementation of Scalable and Practical Natural Gradient Descent (SP-NGD).
    • Evaluation on a ResNet-50 image classification task on the ImageNet dataset.
    • Comparison with highly optimized first-order optimization methods.

    Main Results:

    • SP-NGD achieved a top-1 validation accuracy of 75.4% in 5.5 minutes with a mini-batch size of 32,768 using 1,024 GPUs.
    • Demonstrated convergence to 74.9% accuracy with an extremely large mini-batch size of 131,072 in just 873 SP-NGD steps.
    • SP-NGD exhibited negligible computational overhead compared to first-order methods.

    Conclusions:

    • SP-NGD is a principled and effective approach for large-scale distributed deep learning.
    • The method successfully addresses the generalization gap associated with large mini-batch sizes.
    • SP-NGD offers accelerated convergence and scalability, making it a practical choice for training deep neural networks efficiently.