Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Videos

A new Q-learning algorithm based on the metropolis criterion.

Maozu Guo, Yang Liu, Jacek Malec

IEEE Transactions on Systems, Man, and Cybernetics. Part B, Cybernetics : a Publication of the IEEE Systems, Man, and Cybernetics Society

|October 27, 2004

Summary

This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

AGCECDA: attention-guided heterogeneous graph collaborative embedding for circRNA-drug sensitivity association prediction.

BMC biology·2026

Same author

ProteoformDB: an integrative database for functional roles of proteoforms.

Database : the journal of biological databases and curation·2026

Same author

DHAG-DTA: Dynamic Hierarchical Affinity Graph Model for Drug-Target Binding Affinity Prediction.

IEEE transactions on computational biology and bioinformatics·2025

Same author

TRAPT: a multi-stage fused deep learning framework for predicting transcriptional regulators based on large-scale epigenomic data.

Nature communications·2025

Same author

Syn-Net: A Synchronous Frequency-Perception Fusion Network for Breast Tumor Segmentation in Ultrasound Images.

IEEE journal of biomedical and health informatics·2025

Same author

GWASTool: A web pipeline for detecting SNP-phenotype associations.

Fundamental research·2024

Same journal

Strategic Ability Updating in Concurrent Games by Coalitional Commitment.

IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society·2015

Same journal

Meta-Analysis of the First Facial Expression Recognition Challenge.

IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society·2012

Same journal

Adjustable model-based fusion method for multispectral and panchromatic images.

IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society·2012

Same journal

Face Feature Weighted Fusion Based on Fuzzy Membership Degree for Video Face Recognition.

IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society·2012

Same journal

A New Adaptive Fast Cellular Automaton Neighborhood Detection and Rule Identification Algorithm.

IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society·2012

Same journal

Human-arm-and-hand-dynamic model with variability analyses for a stylus-based haptic interface.

IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society·2012

See all related articles

This study introduces SA-Q-learning, a novel approach to Q-learning that balances exploration and exploitation. SA-Q-learning converges faster than traditional methods without performance degradation from excessive exploration.

Area of Science:

Artificial Intelligence
Machine Learning
Reinforcement Learning

Background:

Q-learning faces challenges in balancing exploration and exploitation for optimal policy selection.
Pure exploitation leads to suboptimal local policies, while excessive exploration hinders performance.

Purpose of the Study:

To address the exploration-exploitation dilemma in Q-learning.
To present a modified Q-learning algorithm, SA-Q-learning, for improved policy optimization.

Main Methods:

Framing Q-learning policy optimization as a combinatorial optimization problem.
Integrating the Metropolis criterion from simulated annealing to balance exploration and exploitation.
Developing the SA-Q-learning algorithm.

Related Experiment Videos

Main Results:

SA-Q-learning demonstrates faster convergence compared to standard Q-learning and Boltzmann exploration.
The proposed method avoids performance degradation associated with excessive exploration.

Conclusions:

SA-Q-learning effectively balances exploration and exploitation in Q-learning.
The algorithm offers a more efficient and robust approach to finding optimal policies in reinforcement learning.