Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Generalization, Discrimination, and Extinction01:24

Generalization, Discrimination, and Extinction

952
Generalization, discrimination, and extinction are key concepts in operant conditioning that influence how behaviors are learned and maintained.
Generalization occurs when a behavior reinforced in one context is performed in similar situations. For instance, a student who studies diligently for calculus and receives excellent grades might apply the same study habits to psychology and history, expecting similar results. Generalization shows how learning in one setting can influence behavior in...
952
Reinforcement01:23

Reinforcement

486
Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:
486
Observational Learning01:12

Observational Learning

408
Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...
408
Reinforcement Schedules01:24

Reinforcement Schedules

278
Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...
278
Evolutionary Psychology01:20

Evolutionary Psychology

561
Evolutionary psychology explores the origins of human behavior and mental processes by framing them within the context of natural selection, a theory famously propounded by Charles Darwin. This field asserts that many behaviors common across human societies — ranging from instinctive fear reactions to complex social interactions — arose as evolutionary adaptations. These adaptations enhanced the survival and reproductive success of our ancestors, thereby becoming embedded in the...
561
Limits to Natural Selection01:38

Limits to Natural Selection

33.0K
Organisms that are well-adapted to their environment are more likely to survive and reproduce. However, natural selection does not lead to perfectly adapted organisms. Several factors constrain natural selection.
33.0K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Identifying PLAU as a shared pathogenic gene in type 2 diabetes and bladder urothelial carcinoma through integrated transcriptomic analysis and machine learning for diagnostic and therapeutic value.

Clinical and experimental medicine·2026
Same author

Autonomous pathfinding for underactuated AUVs using FDHNN.

Scientific reports·2026
Same author

MYC promotes the progression of prostate cancer by regulating CD47 to induce an immunosuppressive microenvironment.

Cellular and molecular life sciences : CMLS·2026
Same author

The BCAA metabolism-related gene BCAT1 promotes the progression of bladder urothelial carcinoma through the PI3K/AKT/mTOR signalling pathway.

Functional & integrative genomics·2026
Same author

Identify the PANoptosis signature and prognostic model via a multimachine-learning computational framework for bladder urothelial carcinoma.

Cancer cell international·2026
Same author

Exploring the Therapeutic Potential of Ferroptosis in Gastric Cancer.

Cancer management and research·2025
Same journal

RETRACTION: Real-Time Modulation of Physical Training Intensity Based on Wavelet Recursive Fuzzy Neural Networks.

Computational intelligence and neuroscience·2026
Same journal

RETRACTION: Multidimensional Heterogeneous Network Link Adaptation Based on Mobile Environment.

Computational intelligence and neuroscience·2026
Same journal

RETRACTION: Framework to Segment and Evaluate Multiple Sclerosis Lesion in MRI Slices Using VGG-UNet.

Computational intelligence and neuroscience·2026
Same journal

RETRACTION: Facial Emotion Recognition Using a Novel Fusion of Convolutional Neural Network and Local Binary Pattern in Crime Investigation.

Computational intelligence and neuroscience·2026
Same journal

RETRACTION: Automatic Intelligent System Using Medical of Things for Multiple Sclerosis Detection.

Computational intelligence and neuroscience·2026
Same journal

RETRACTION: Intangible Cultural Heritage Reproduction and Revitalization: Value Feedback, Practice, and Exploration Based on the IPA Model.

Computational intelligence and neuroscience·2026
See all related articles

Related Experiment Video

Updated: Oct 24, 2025

Author Spotlight: Advancing Protein Engineering – Harnessing Evolution Through PRANCE and Lab Automation
05:08

Author Spotlight: Advancing Protein Engineering – Harnessing Evolution Through PRANCE and Lab Automation

Published on: January 12, 2024

1.8K

Diversity Evolutionary Policy Deep Reinforcement Learning.

Jian Liu1,2, Liming Feng1,2

  • 1School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China.

Computational Intelligence and Neuroscience
|August 16, 2021
PubMed
Summary
This summary is machine-generated.

Reinforcement learning agents can get stuck in local optima. This study introduces a diversity evolutionary policy deep reinforcement learning (DEPRL) algorithm to enhance exploration and improve performance in continuous control tasks.

More Related Videos

New Variations for Strategy Set-shifting in the Rat
09:45

New Variations for Strategy Set-shifting in the Rat

Published on: January 23, 2017

8.3K
A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis
05:41

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

Published on: February 6, 2020

9.6K

Related Experiment Videos

Last Updated: Oct 24, 2025

Author Spotlight: Advancing Protein Engineering – Harnessing Evolution Through PRANCE and Lab Automation
05:08

Author Spotlight: Advancing Protein Engineering – Harnessing Evolution Through PRANCE and Lab Automation

Published on: January 12, 2024

1.8K
New Variations for Strategy Set-shifting in the Rat
09:45

New Variations for Strategy Set-shifting in the Rat

Published on: January 23, 2017

8.3K
A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis
05:41

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

Published on: February 6, 2020

9.6K

Area of Science:

  • Artificial Intelligence
  • Machine Learning
  • Robotics

Background:

  • Policy gradient reinforcement learning algorithms risk local optima due to gradient disappearance, hindering agent exploration.
  • Existing methods struggle with maintaining policy diversity, limiting performance in complex continuous control tasks.

Purpose of the Study:

  • To propose a novel algorithm, Diversity Evolutionary Policy Deep Reinforcement Learning (DEPRL), to address the local optima problem in reinforcement learning.
  • To enhance the exploration capabilities of reinforcement learning agents by promoting policy diversity.

Main Methods:

  • Combined Cross-Entropy Method (CEM), Maximum Mean Discrepancy (MMD), and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm.
  • Utilized MMD to measure policy distance, encouraging maximization of return and inter-policy distance during gradient updates.
  • Incorporated cumulative returns and policy distance into population fitness to promote offspring diversity.

Main Results:

  • DEPRL demonstrated excellent performance on continuous control tasks within the MuJoCo environment.
  • Achieved a significant performance improvement, nearly 20% increase in return, compared to TD3 in the Ant-v2 environment.
  • Effectively reduced the risk of falling into local optima by enhancing policy exploration.

Conclusions:

  • DEPRL successfully mitigates the local optima issue in reinforcement learning through enhanced policy diversity.
  • The proposed method offers a promising approach for improving the performance and exploration capabilities of deep reinforcement learning agents in continuous control.