Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Gradient-based optimization of hyperparameters.

Y Bengio1

  • 1Département d'informatique et recherche opérationnelle, Université de Montréal, Montréal, Québec, Canada, H3C 3J7.

Neural Computation
|August 23, 2000
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Adaptive importance sampling to accelerate training of a neural probabilistic language model.

IEEE transactions on neural networks·2008
Same author

Taking on the curse of dimensionality in joint distributions using neural networks.

IEEE transactions on neural networks·2008
Same author

Cost functions and model combination for VaR-based asset allocation using neural networks.

IEEE transactions on neural networks·2008
Same author

Experiments on the application of IOHMMs to model financial returns series.

IEEE transactions on neural networks·2008
Same author

Bias learning, knowledge sharing.

IEEE transactions on neural networks·2008
Same author

Locally linear embedding for dimensionality reduction in QSAR.

Journal of computer-aided molecular design·2005
Same journal

A Model-Free Reinforcement Learning Implementation of Decision Making Under Uncertainty by Sequential Sampling.

Neural computation·2026
Same journal

DROP: Distributional and Regular Optimism and Pessimism for Reinforcement Learning.

Neural computation·2026
Same journal

Hierarchical Active Inference Using Successor Representations.

Neural computation·2026
Same journal

W-Kernel and Its Principal Space for Frequentist Evaluation of Bayesian Estimators.

Neural computation·2026
Same journal

A Hidden Markov Model-Inspired Sequence Classification Method for Hyperdimensional Computing.

Neural computation·2026
Same journal

Sparse Graphical Modeling for Electrophysiological Phase-Based Connectivity Using Circular Statistics.

Neural computation·2026
See all related articles

This study introduces a novel method for optimizing machine learning hyperparameters by calculating the gradient of model selection criteria. This approach offers an efficient alternative to trial-and-error hyperparameter tuning.

Area of Science:

  • Machine Learning
  • Optimization
  • Computational Science

Background:

  • Machine learning algorithms often require hyperparameter tuning.
  • Current methods rely on inefficient trial-and-error approaches.
  • Optimizing hyperparameters is crucial for model performance.

Purpose of the Study:

  • To develop an efficient methodology for optimizing multiple machine learning hyperparameters.
  • To compute the gradient of a model selection criterion with respect to hyperparameters.

Main Methods:

  • The study proposes a gradient-based optimization methodology.
  • For quadratic criteria, efficient computation is achieved via backpropagation through Cholesky decomposition.
  • For general criteria, the implicit function theorem is employed to derive hyperparameter gradients.

Related Experiment Videos

Main Results:

  • A computationally efficient method for hyperparameter gradient calculation is presented.
  • The methodology is applicable to both quadratic and general training criteria.
  • The approach avoids extensive trial-and-error tuning.

Conclusions:

  • The proposed gradient-based methodology offers an efficient and systematic way to optimize machine learning hyperparameters.
  • This work advances hyperparameter optimization techniques in machine learning.
  • The findings can lead to improved model performance and reduced computational cost.