Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Generalization, Discrimination, and Extinction

Generalization, Discrimination, and Extinction

Generalization, discrimination, and extinction are key concepts in operant conditioning that influence how behaviors are learned and maintained.
Generalization occurs when a behavior reinforced in one context is performed in similar situations. For instance, a student who studies diligently for calculus and receives excellent grades might apply the same study habits to psychology and history, expecting similar results. Generalization shows how learning in one setting can influence behavior in...

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning...

Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

Evolutionary Psychology

Evolutionary Psychology

Evolutionary psychology explores the origins of human behavior and mental processes by framing them within the context of natural selection, a theory famously propounded by Charles Darwin. This field asserts that many behaviors common across human societies — ranging from instinctive fear reactions to complex social interactions — arose as evolutionary adaptations. These adaptations enhanced the survival and reproductive success of our ancestors, thereby becoming embedded in the...

Limits to Natural Selection

Limits to Natural Selection

Organisms that are well-adapted to their environment are more likely to survive and reproduce. However, natural selection does not lead to perfectly adapted organisms. Several factors constrain natural selection.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Identifying PLAU as a shared pathogenic gene in type 2 diabetes and bladder urothelial carcinoma through integrated transcriptomic analysis and machine learning for diagnostic and therapeutic value.

Clinical and experimental medicine·2026

Same author

Autonomous pathfinding for underactuated AUVs using FDHNN.

Scientific reports·2026

Same author

MYC promotes the progression of prostate cancer by regulating CD47 to induce an immunosuppressive microenvironment.

Cellular and molecular life sciences : CMLS·2026

Same author

The BCAA metabolism-related gene BCAT1 promotes the progression of bladder urothelial carcinoma through the PI3K/AKT/mTOR signalling pathway.

Functional & integrative genomics·2026

Same author

Identify the PANoptosis signature and prognostic model via a multimachine-learning computational framework for bladder urothelial carcinoma.

Cancer cell international·2026

Same author

Exploring the Therapeutic Potential of Ferroptosis in Gastric Cancer.

Cancer management and research·2025

Same journal

RETRACTION: Real-Time Modulation of Physical Training Intensity Based on Wavelet Recursive Fuzzy Neural Networks.

Computational intelligence and neuroscience·2026

Same journal

RETRACTION: Multidimensional Heterogeneous Network Link Adaptation Based on Mobile Environment.

Computational intelligence and neuroscience·2026

Same journal

RETRACTION: Framework to Segment and Evaluate Multiple Sclerosis Lesion in MRI Slices Using VGG-UNet.

Computational intelligence and neuroscience·2026

Same journal

RETRACTION: Facial Emotion Recognition Using a Novel Fusion of Convolutional Neural Network and Local Binary Pattern in Crime Investigation.

Computational intelligence and neuroscience·2026

Same journal

RETRACTION: Automatic Intelligent System Using Medical of Things for Multiple Sclerosis Detection.

Computational intelligence and neuroscience·2026

Same journal

RETRACTION: Intangible Cultural Heritage Reproduction and Revitalization: Value Feedback, Practice, and Exploration Based on the IPA Model.

Computational intelligence and neuroscience·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Oct 24, 2025

Author Spotlight: Advancing Protein Engineering – Harnessing Evolution Through PRANCE and Lab Automation

Author Spotlight: Advancing Protein Engineering – Harnessing Evolution Through PRANCE and Lab Automation

Published on: January 12, 2024

Diversity Evolutionary Policy Deep Reinforcement Learning.

Jian Liu^1,2, Liming Feng^1,2

¹School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China.

Computational Intelligence and Neuroscience

|August 16, 2021

Summary

This summary is machine-generated.

Reinforcement learning agents can get stuck in local optima. This study introduces a diversity evolutionary policy deep reinforcement learning (DEPRL) algorithm to enhance exploration and improve performance in continuous control tasks.

More Related Videos

New Variations for Strategy Set-shifting in the Rat

New Variations for Strategy Set-shifting in the Rat

Published on: January 23, 2017

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

Published on: February 6, 2020

Related Experiment Videos

Last Updated: Oct 24, 2025

Author Spotlight: Advancing Protein Engineering – Harnessing Evolution Through PRANCE and Lab Automation

Author Spotlight: Advancing Protein Engineering – Harnessing Evolution Through PRANCE and Lab Automation

Published on: January 12, 2024

New Variations for Strategy Set-shifting in the Rat

New Variations for Strategy Set-shifting in the Rat

Published on: January 23, 2017

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis

Published on: February 6, 2020

Area of Science:

Artificial Intelligence
Machine Learning
Robotics

Background:

Policy gradient reinforcement learning algorithms risk local optima due to gradient disappearance, hindering agent exploration.
Existing methods struggle with maintaining policy diversity, limiting performance in complex continuous control tasks.

Purpose of the Study:

To propose a novel algorithm, Diversity Evolutionary Policy Deep Reinforcement Learning (DEPRL), to address the local optima problem in reinforcement learning.
To enhance the exploration capabilities of reinforcement learning agents by promoting policy diversity.

Main Methods:

Combined Cross-Entropy Method (CEM), Maximum Mean Discrepancy (MMD), and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm.
Utilized MMD to measure policy distance, encouraging maximization of return and inter-policy distance during gradient updates.
Incorporated cumulative returns and policy distance into population fitness to promote offspring diversity.

Main Results:

DEPRL demonstrated excellent performance on continuous control tasks within the MuJoCo environment.
Achieved a significant performance improvement, nearly 20% increase in return, compared to TD3 in the Ant-v2 environment.
Effectively reduced the risk of falling into local optima by enhancing policy exploration.

Conclusions:

DEPRL successfully mitigates the local optima issue in reinforcement learning through enhanced policy diversity.
The proposed method offers a promising approach for improving the performance and exploration capabilities of deep reinforcement learning agents in continuous control.