Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...

Avoidance Learning and Learned Helplessness

Avoidance Learning and Learned Helplessness

Avoidance learning and learned helplessness are critical concepts in understanding behavioral responses to negative stimuli.
Avoidance learning occurs when an organism learns that a specific behavior can prevent an unpleasant outcome. For example, a student who receives a bad grade may start studying harder to avoid future poor grades. This behavior persists even when the negative outcome is no longer present. Avoidance learning is powerful because it maintains behavior in the absence of the...

Evaluating Limits by Direct Substitution

Evaluating Limits by Direct Substitution

In the analysis of functions that represent continuous physical phenomena, it is often necessary to determine the output value as the input approaches a specific point. When a combination of algebraic terms defines the function and exhibits no discontinuities or abrupt changes near the point of interest, the limit of the function can be evaluated directly. This process, known as direct substitution, involves replacing the variable in the expression with the value it approaches.Direct...

Forced Transdifferentiation

Forced Transdifferentiation

Transdifferentiation, also known as lineage reprogramming, was first discovered by Selman and Kafatos in 1974 in silkmoths. They observed that the moths’ cuticle-producing cells transformed into salt-producing cells. Many such cases of natural transdifferentiation occur in organisms. In humans, pancreatic alpha cells can become beta cells. In newts, the loss of the eye’s lens causes the pigmented epithelial cells to transdifferentiate into the lens cells.
Artificial...

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

Introduction to Learning

Introduction to Learning

Learning is the process of acquiring knowledge or skills through practice or experience, leading to long-lasting behavioral changes. This acquisition occurs through interaction with the environment and requires practice or experience. For instance, mastering a skill such as surfing requires considerable practice and experience, highlighting the essential role of repeated interactions with the environment in learning.
In contrast to learned behaviors, unlearned behaviors such as crying, sexual...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Template-Based Label Propagation for Mouse Brain MRI Skull Stripping.

Neuroinformatics·2026

Same author

CRTC1 knockdown in the marmoset visual cortex induces neuronal IEG overexpression, HFOs, and neurodegeneration.

Neuroscience research·2026

Same author

Brain/MINDS Marmoset Brain Atlas 2.0: Population Cortical Parcellation With Multi-Modal Templates.

Scientific data·2026

Same author

Data-driven inverse optimal control for continuous-time nonlinear systems.

ISA transactions·2025

Same author

Blaming luck, claiming skill: Self-attribution bias in error assignment.

PLoS computational biology·2025

Same author

Decoding Confidence in Future Event: EEG Markers of Prospective Confidence in Perceptual and Memory Tasks.

Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference·2025

Same journal

DSPE-ViT: a lightweight vision transformer with dynamic sparse positional encoding for dense small object detection in UAV imagery.

Frontiers in neurorobotics·2026

Same journal

ST-HONet: Spatio-Temporal Hierarchical Network for long-horizon bimanual visuomotor imitation.

Frontiers in neurorobotics·2026

Same journal

ST-HADP: Spatio-Temporal hierarchical attention diffusion policy for long-horizon generalizable bimanual visuomotor imitation.

Frontiers in neurorobotics·2026

Same journal

EQISP: efficient quantized image signal processing with multi-scale pyramid fusion for resource constrained embodied perception.

Frontiers in neurorobotics·2026

Same journal

Research on embodied agent multimodal perception and real-time path planning algorithms for complex unstructured environments.

Frontiers in neurorobotics·2026

Same journal

NL-YOLOv5: a model with a larger receptive field and the ability to globally acquire features.

Frontiers in neurorobotics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Dec 31, 2025

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning.

Shota Ohnishi¹, Eiji Uchibe², Yotaro Yamaguchi³

¹Department of Systems Science, Graduate School of Informatics, Kyoto University, Now Affiliated With Panasonic Co., Ltd., Kyoto, Japan.

Frontiers in Neurorobotics

|January 11, 2020

Summary

This summary is machine-generated.

Constrained Deep Q Network (DQN) uses target value constraints for more stable and sample-efficient deep reinforcement learning. This method converges faster with smaller datasets and is robust to parameter tuning.

Keywords:

constrained reinforcement learning deep Q network deep reinforcement learning learning stabilization regularization target network

More Related Videos

Deep Neural Networks for Image-Based Dietary Assessment

Deep Neural Networks for Image-Based Dietary Assessment

Published on: March 13, 2021

Related Experiment Videos

Last Updated: Dec 31, 2025

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

Deep Neural Networks for Image-Based Dietary Assessment

Deep Neural Networks for Image-Based Dietary Assessment

Published on: March 13, 2021

Area of Science:

Artificial Intelligence
Machine Learning
Deep Reinforcement Learning

Background:

Deep Q Network (DQN) extends Q learning using convolutional neural networks to approximate Q functions for optimal policy derivation.
DQN utilizes a target network for stable learning, but infrequent updates necessitate large datasets for target value propagation.

Purpose of the Study:

To introduce Constrained DQN, a novel method that enhances learning efficiency and stability in deep reinforcement learning.
To address the sample inefficiency of standard DQN by introducing a constraint mechanism for target value updates.

Main Methods:

Proposed Constrained DQN, which uses the difference between Q function and target network outputs as a constraint on the target value.
Implemented conservative parameter updates when the difference is large and aggressive updates when the difference is small.
Observed a decrease in constraint activation as learning progresses, gradually approaching conventional Q learning.

Main Results:

Constrained DQN converges with a significantly smaller training dataset compared to standard DQN.
The proposed method demonstrates robustness against variations in target network update frequency and optimizer parameter settings.
Experimental results indicate Constrained DQN can be effectively integrated with existing approaches like integrated and distributed methods.

Conclusions:

Constrained DQN offers improved sample efficiency and stability over standard DQN.
The method is adaptable and can enhance the performance of other advanced deep reinforcement learning techniques.
Constrained DQN presents a valuable component for developing more efficient and robust reinforcement learning systems.