Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

A new Q-learning algorithm based on the metropolis criterion.

Maozu Guo, Yang Liu, Jacek Malec

    IEEE Transactions on Systems, Man, and Cybernetics. Part B, Cybernetics : a Publication of the IEEE Systems, Man, and Cybernetics Society
    |October 27, 2004
    PubMed
    Summary
    This summary is machine-generated.

    Related Concept Videos

    You might also read

    Related Articles

    Articles linked to this work by shared authors, journal, and citation graph.

    Sort by
    Same author

    AGCECDA: attention-guided heterogeneous graph collaborative embedding for circRNA-drug sensitivity association prediction.

    BMC biology·2026
    Same author

    ProteoformDB: an integrative database for functional roles of proteoforms.

    Database : the journal of biological databases and curation·2026
    Same author

    DHAG-DTA: Dynamic Hierarchical Affinity Graph Model for Drug-Target Binding Affinity Prediction.

    IEEE transactions on computational biology and bioinformatics·2025
    Same author

    TRAPT: a multi-stage fused deep learning framework for predicting transcriptional regulators based on large-scale epigenomic data.

    Nature communications·2025
    Same author

    Syn-Net: A Synchronous Frequency-Perception Fusion Network for Breast Tumor Segmentation in Ultrasound Images.

    IEEE journal of biomedical and health informatics·2025
    Same author

    GWASTool: A web pipeline for detecting SNP-phenotype associations.

    Fundamental research·2024
    Same journal

    Strategic Ability Updating in Concurrent Games by Coalitional Commitment.

    IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society·2015
    Same journal

    Meta-Analysis of the First Facial Expression Recognition Challenge.

    IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society·2012
    Same journal

    Adjustable model-based fusion method for multispectral and panchromatic images.

    IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society·2012
    Same journal

    Face Feature Weighted Fusion Based on Fuzzy Membership Degree for Video Face Recognition.

    IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society·2012
    Same journal

    A New Adaptive Fast Cellular Automaton Neighborhood Detection and Rule Identification Algorithm.

    IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society·2012
    Same journal

    Human-arm-and-hand-dynamic model with variability analyses for a stylus-based haptic interface.

    IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society·2012
    See all related articles

    This study introduces SA-Q-learning, a novel approach to Q-learning that balances exploration and exploitation. SA-Q-learning converges faster than traditional methods without performance degradation from excessive exploration.

    Area of Science:

    • Artificial Intelligence
    • Machine Learning
    • Reinforcement Learning

    Background:

    • Q-learning faces challenges in balancing exploration and exploitation for optimal policy selection.
    • Pure exploitation leads to suboptimal local policies, while excessive exploration hinders performance.

    Purpose of the Study:

    • To address the exploration-exploitation dilemma in Q-learning.
    • To present a modified Q-learning algorithm, SA-Q-learning, for improved policy optimization.

    Main Methods:

    • Framing Q-learning policy optimization as a combinatorial optimization problem.
    • Integrating the Metropolis criterion from simulated annealing to balance exploration and exploitation.
    • Developing the SA-Q-learning algorithm.

    Related Experiment Videos

    Main Results:

    • SA-Q-learning demonstrates faster convergence compared to standard Q-learning and Boltzmann exploration.
    • The proposed method avoids performance degradation associated with excessive exploration.

    Conclusions:

    • SA-Q-learning effectively balances exploration and exploitation in Q-learning.
    • The algorithm offers a more efficient and robust approach to finding optimal policies in reinforcement learning.