Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Observational Learning01:12

Observational Learning

755
Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...
755
Avoidance Learning and Learned Helplessness01:14

Avoidance Learning and Learned Helplessness

2.4K
Avoidance learning and learned helplessness are critical concepts in understanding behavioral responses to negative stimuli.
Avoidance learning occurs when an organism learns that a specific behavior can prevent an unpleasant outcome. For example, a student who receives a bad grade may start studying harder to avoid future poor grades. This behavior persists even when the negative outcome is no longer present. Avoidance learning is powerful because it maintains behavior in the absence of the...
2.4K
Evaluating Limits by Direct Substitution01:29

Evaluating Limits by Direct Substitution

107
In the analysis of functions that represent continuous physical phenomena, it is often necessary to determine the output value as the input approaches a specific point. When a combination of algebraic terms defines the function and exhibits no discontinuities or abrupt changes near the point of interest, the limit of the function can be evaluated directly. This process, known as direct substitution, involves replacing the variable in the expression with the value it approaches.Direct...
107
Forced Transdifferentiation01:28

Forced Transdifferentiation

2.2K
Transdifferentiation, also known as lineage reprogramming, was first discovered by Selman and Kafatos in 1974 in silkmoths. They observed that the moths’ cuticle-producing cells transformed into salt-producing cells. Many such cases of natural transdifferentiation occur in organisms. In humans, pancreatic alpha cells can become beta cells. In newts, the loss of the eye’s lens causes the pigmented epithelial cells to transdifferentiate into the lens cells.
Artificial...
2.2K
Reinforcement01:23

Reinforcement

751
Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:
751
Introduction to Learning01:18

Introduction to Learning

850
Learning is the process of acquiring knowledge or skills through practice or experience, leading to long-lasting behavioral changes. This acquisition occurs through interaction with the environment and requires practice or experience. For instance, mastering a skill such as surfing requires considerable practice and experience, highlighting the essential role of repeated interactions with the environment in learning.
In contrast to learned behaviors, unlearned behaviors such as crying, sexual...
850

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Template-Based Label Propagation for Mouse Brain MRI Skull Stripping.

Neuroinformatics·2026
Same author

CRTC1 knockdown in the marmoset visual cortex induces neuronal IEG overexpression, HFOs, and neurodegeneration.

Neuroscience research·2026
Same author

Brain/MINDS Marmoset Brain Atlas 2.0: Population Cortical Parcellation With Multi-Modal Templates.

Scientific data·2026
Same author

Data-driven inverse optimal control for continuous-time nonlinear systems.

ISA transactions·2025
Same author

Blaming luck, claiming skill: Self-attribution bias in error assignment.

PLoS computational biology·2025
Same author

Decoding Confidence in Future Event: EEG Markers of Prospective Confidence in Perceptual and Memory Tasks.

Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference·2025
Same journal

DSPE-ViT: a lightweight vision transformer with dynamic sparse positional encoding for dense small object detection in UAV imagery.

Frontiers in neurorobotics·2026
Same journal

ST-HONet: Spatio-Temporal Hierarchical Network for long-horizon bimanual visuomotor imitation.

Frontiers in neurorobotics·2026
Same journal

ST-HADP: Spatio-Temporal hierarchical attention diffusion policy for long-horizon generalizable bimanual visuomotor imitation.

Frontiers in neurorobotics·2026
Same journal

EQISP: efficient quantized image signal processing with multi-scale pyramid fusion for resource constrained embodied perception.

Frontiers in neurorobotics·2026
Same journal

Research on embodied agent multimodal perception and real-time path planning algorithms for complex unstructured environments.

Frontiers in neurorobotics·2026
Same journal

NL-YOLOv5: a model with a larger receptive field and the ability to globally acquire features.

Frontiers in neurorobotics·2026
See all related articles

Related Experiment Video

Updated: Dec 31, 2025

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control
08:18

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

5.4K

Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning.

Shota Ohnishi1, Eiji Uchibe2, Yotaro Yamaguchi3

  • 1Department of Systems Science, Graduate School of Informatics, Kyoto University, Now Affiliated With Panasonic Co., Ltd., Kyoto, Japan.

Frontiers in Neurorobotics
|January 11, 2020
PubMed
Summary
This summary is machine-generated.

Constrained Deep Q Network (DQN) uses target value constraints for more stable and sample-efficient deep reinforcement learning. This method converges faster with smaller datasets and is robust to parameter tuning.

Keywords:
constrained reinforcement learningdeep Q networkdeep reinforcement learninglearning stabilizationregularizationtarget network

More Related Videos

Deep Neural Networks for Image-Based Dietary Assessment
13:19

Deep Neural Networks for Image-Based Dietary Assessment

Published on: March 13, 2021

9.9K

Related Experiment Videos

Last Updated: Dec 31, 2025

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control
08:18

WheelCon: A Wheel Control-Based Gaming Platform for Studying Human Sensorimotor Control

Published on: August 15, 2020

5.4K
Deep Neural Networks for Image-Based Dietary Assessment
13:19

Deep Neural Networks for Image-Based Dietary Assessment

Published on: March 13, 2021

9.9K

Area of Science:

  • Artificial Intelligence
  • Machine Learning
  • Deep Reinforcement Learning

Background:

  • Deep Q Network (DQN) extends Q learning using convolutional neural networks to approximate Q functions for optimal policy derivation.
  • DQN utilizes a target network for stable learning, but infrequent updates necessitate large datasets for target value propagation.

Purpose of the Study:

  • To introduce Constrained DQN, a novel method that enhances learning efficiency and stability in deep reinforcement learning.
  • To address the sample inefficiency of standard DQN by introducing a constraint mechanism for target value updates.

Main Methods:

  • Proposed Constrained DQN, which uses the difference between Q function and target network outputs as a constraint on the target value.
  • Implemented conservative parameter updates when the difference is large and aggressive updates when the difference is small.
  • Observed a decrease in constraint activation as learning progresses, gradually approaching conventional Q learning.

Main Results:

  • Constrained DQN converges with a significantly smaller training dataset compared to standard DQN.
  • The proposed method demonstrates robustness against variations in target network update frequency and optimizer parameter settings.
  • Experimental results indicate Constrained DQN can be effectively integrated with existing approaches like integrated and distributed methods.

Conclusions:

  • Constrained DQN offers improved sample efficiency and stability over standard DQN.
  • The method is adaptable and can enhance the performance of other advanced deep reinforcement learning techniques.
  • Constrained DQN presents a valuable component for developing more efficient and robust reinforcement learning systems.