Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...

Purposive Learning

Purposive Learning

E. C. Tolman emphasized the purposiveness of behavior — the idea that much of our behavior is goal-directed. For instance, employees who aim for a promotion work diligently to meet their targets. Tolman argued that when classical conditioning and operant conditioning occur, the organism acquires certain expectations. In classical conditioning, a child might fear a dog because they expect it to bite. In operant conditioning, a person might consistently work overtime because they expect a...

Cognitive Learning

Cognitive Learning

Cognitive learning is based on purposive behavior, incidental learning, and insight learning.
E. C. Tolman's theory of purposive behavior emphasizes that much behavior is goal-directed. He argued that to understand behavior, we must look at the entire sequence of actions leading to a goal. For instance, high school students study hard, not just due to past reinforcement but also to achieve the goal of getting into a good college.
Tolman introduced the idea that behavior is influenced by...

Behaviorism

Behaviorism

The field of behaviorism was pioneered by figures such as Ivan Pavlov, John B. Watson, and B.F. Skinner fundamentally shifted the focus of psychology to the observable and controllable aspects of human and animal behavior. This shift marked a critical evolution in the discipline, emphasizing scientific rigor and experimental methodology.
The core premise of behaviorism is its focus on observable behavior rather than internal thoughts or feelings. This approach argues that true scientific...

Avoidance Learning and Learned Helplessness

Avoidance Learning and Learned Helplessness

Avoidance learning and learned helplessness are critical concepts in understanding behavioral responses to negative stimuli.
Avoidance learning occurs when an organism learns that a specific behavior can prevent an unpleasant outcome. For example, a student who receives a bad grade may start studying harder to avoid future poor grades. This behavior persists even when the negative outcome is no longer present. Avoidance learning is powerful because it maintains behavior in the absence of the...

Generalization, Discrimination, and Extinction

Generalization, Discrimination, and Extinction

Generalization, discrimination, and extinction are key concepts in operant conditioning that influence how behaviors are learned and maintained.
Generalization occurs when a behavior reinforced in one context is performed in similar situations. For instance, a student who studies diligently for calculus and receives excellent grades might apply the same study habits to psychology and history, expecting similar results. Generalization shows how learning in one setting can influence behavior in...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

[Expression of eosinophil major basic protein and neutrophil elastase in nasal polyp tissue and secretion].

Lin chuang er bi yan hou tou jing wai ke za zhi = Journal of clinical otorhinolaryngology head and neck surgery·2008

Same author

[Effect of interferon-gamma on the expression of vascular endothelial growth factor C on Hep-2 laryngeal carcinoma cell lines].

Lin chuang er bi yan hou tou jing wai ke za zhi = Journal of clinical otorhinolaryngology head and neck surgery·2008

Same author

Effects of 18alpha-glycyrrhizin on the pharmacodynamics and pharmacokinetics of glibenclamide in alloxan-induced diabetic rats.

European journal of pharmacology·2008

Same author

[Inhibition of oxidative activity of myeloperoxidase by anti-myeloperoxidase antibodies from patients with microscopic polyangiitis].

Beijing da xue xue bao. Yi xue ban = Journal of Peking University. Health sciences·2008

Same author

Gene delivery of indoleamine 2,3-dioxygenase prolongs cardiac allograft survival by shaping the types of T-cell responses.

The journal of gene medicine·2008

Same author

[Ultrasonographic findings of intussusception complicated by intestinal necrosis in children].

Zhongguo dang dai er ke za zhi = Chinese journal of contemporary pediatrics·2008

Same journal

Robust Semiglobal and Global Stabilization for Nonlinear Normal Form Systems by Time-Varying Feedback.

IEEE transactions on cybernetics·2026

Same journal

Adaptive Global Asymptotic Output Stabilization of Uncertain Nonlinear Systems Under Dynamic State/Input Quantization.

IEEE transactions on cybernetics·2026

Same journal

Accelerated Distributed Gradient Tracking for Constrained Aggregative Optimization Over Time-Varying Digraphs.

IEEE transactions on cybernetics·2026

Same journal

Small-Gain-Based Plug-and-Play Distributed Control Framework for DC Microgrids With Decentralized Reconfiguration.

IEEE transactions on cybernetics·2026

Same journal

Prescribed-Time Impulsive Control of High-Order Integrator Systems.

IEEE transactions on cybernetics·2026

Same journal

Relaxed Stability Conditions for Model Predictive Control of Hybrid Dynamical Systems Using Hybrid Recurrent Neural Networks.

IEEE transactions on cybernetics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Oct 1, 2025

Four Temporary Waterslide Designs Adapted to Different Slope Conditions to Encourage Child Socialization in Playgrounds

Four Temporary Waterslide Designs Adapted to Different Slope Conditions to Encourage Child Socialization in Playgrounds

Published on: December 9, 2022

Policy Gradient From Demonstration and Curiosity.

Jie Chen, Wenjun Xu

IEEE Transactions on Cybernetics

|March 3, 2022

Summary

This summary is machine-generated.

This study introduces a new reinforcement learning algorithm that improves exploration and learning from limited expert demonstrations. The method enhances agent performance in tasks with sparse rewards by integrating policy gradients with expert divergence and uncertainty estimation.

More Related Videos

Quantifying Learning in Young Infants: Tracking Leg Actions During a Discovery-learning Task

Quantifying Learning in Young Infants: Tracking Leg Actions During a Discovery-learning Task

Published on: June 1, 2015

High-resolution Measurement of Odor-Driven Behavior in Drosophila Larvae

High-resolution Measurement of Odor-Driven Behavior in Drosophila Larvae

Published on: January 3, 2008

Related Experiment Videos

Last Updated: Oct 1, 2025

Four Temporary Waterslide Designs Adapted to Different Slope Conditions to Encourage Child Socialization in Playgrounds

Four Temporary Waterslide Designs Adapted to Different Slope Conditions to Encourage Child Socialization in Playgrounds

Published on: December 9, 2022

Quantifying Learning in Young Infants: Tracking Leg Actions During a Discovery-learning Task

Quantifying Learning in Young Infants: Tracking Leg Actions During a Discovery-learning Task

Published on: June 1, 2015

High-resolution Measurement of Odor-Driven Behavior in Drosophila Larvae

High-resolution Measurement of Odor-Driven Behavior in Drosophila Larvae

Published on: January 3, 2008

Area of Science:

Artificial Intelligence
Machine Learning
Robotics

Background:

Reinforcement learning (RL) agents learn complex behaviors from task abstractions.
Exploration and reward shaping are challenging in RL, especially with sparse extrinsic feedback.
Existing methods often require numerous high-quality expert demonstrations, which are difficult to obtain.

Purpose of the Study:

To propose an integrated policy gradient algorithm for enhanced exploration and intrinsic reward learning.
To address the challenge of learning from a limited number of expert demonstrations in RL.
To improve agent performance in environments with sparse reward signals.

Main Methods:

Developed an integrated policy gradient algorithm.
Reformulated the reward function with Jensen-Shannon divergence between policy and expert demonstrations.
Incorporated an agent's environmental uncertainty estimation into the reward function.
Evaluated the algorithm on simulated tasks with sparse rewards and limited demonstrations.

Main Results:

Demonstrated superior exploration efficiency across all tested tasks.
Achieved high average returns in environments with sparse extrinsic rewards.
Showcased the agent's ability to imitate expert behavior effectively.
Validated the algorithm's performance with limited expert trajectories.

Conclusions:

The proposed algorithm effectively boosts exploration and intrinsic reward learning in RL.
Limited expert demonstrations can be leveraged for improved agent performance.
The method balances imitation of expert behavior with maintaining high returns.