Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Observational Learning

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning because...

Reinforcement

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

Purposive Learning

Purposive Learning

E. C. Tolman emphasized the purposiveness of behavior — the idea that much of our behavior is goal-directed. For instance, employees who aim for a promotion work diligently to meet their targets. Tolman argued that when classical conditioning and operant conditioning occur, the organism acquires certain expectations. In classical conditioning, a child might fear a dog because they expect it to bite. In operant conditioning, a person might consistently work overtime because they expect a bonus...

Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...

Elaborative Rehearsals

Elaborative Rehearsals

Elaborative rehearsal is a crucial cognitive strategy that strengthens information encoding in long-term memory by making meaningful connections between new data and pre-existing knowledge. This approach contrasts with maintenance rehearsal, which involves simple repetition without delving into the significance of the information. While maintenance rehearsal might temporarily keep information active in short-term memory, it is less effective for long-term retention.
The effectiveness of...

Automatic Processing and Automatic Social Behavior

Automatic Processing and Automatic Social Behavior

Automatic processing refers to the cognitive operations that occur without conscious intent or awareness, playing a fundamental role in shaping social cognition and behavior. These processes enable individuals to navigate complex social environments efficiently by relying on mental shortcuts and pre-existing knowledge structures known as schemas. One of the most influential mechanisms underlying automatic processing is priming, which subtly activates mental representations through exposure to...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Density Distillation for Fast Nonparametric Density Estimation.

IEEE transactions on neural networks and learning systems·2022

Same author

VERAM: View-Enhanced Recurrent Attention Model for 3D Shape Classification.

IEEE transactions on visualization and computer graphics·2018

Same author

T<sub>1</sub>-T<sub>2</sub> molecular magnetic resonance imaging of renal carcinoma cells based on nano-contrast agents.

International journal of nanomedicine·2018

Same author

Learning Discriminative 3D Shape Representations by View Discerning Networks.

IEEE transactions on visualization and computer graphics·2018

Same author

Polygalacic acid inhibits MMPs expression and osteoarthritis via Wnt/β-catenin and MAPK signal pathways suppression.

International immunopharmacology·2018

Same author

Synthesis of thioether andrographolide derivatives and their inhibitory effect against cancer cells.

MedChemComm·2018

Same journal

Exploiting audio-visual modalities in videos: Object detection via multi-stage bilateral coupling network.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Reliability-aware modality completion with cross-modal distillation for federated learning with missing modalities.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

IGFD-Net: Illumination-guided frequency decoupling for polarization image fusion.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Multiple-Strategies dung beetle optimizer and its applications in engineering optimization and bankruptcy prediction.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Aggregating global-scale pixel-wise forgery cues within a graph.

Neural networks : the official journal of the International Neural Network Society·2026

Same journal

Finite-Time intermittent control for secure synchronization of Neutral-Type stochastic delayed neural networks under aperiodic DoS attacks.

Neural networks : the official journal of the International Neural Network Society·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 25, 2026

Virtual Agent for Real-Time Motivational Interviewing by Integrating Adaptive Nonverbal Behavior and Language Models

Virtual Agent for Real-Time Motivational Interviewing by Integrating Adaptive Nonverbal Behavior and Language Models

Published on: December 23, 2025

Collective reflection-based multi-agent reinforcement learning framework for task-oriented dialogue policy learning.

Kai Xu¹, Zhenyu Wang², Yangyang Zhao³

¹Guangdong Provincial Key Laboratory of Intellectual Property and Big Data, Guangdong Polytechnic Normal University, Guangzhou, 510665, Guangdong, China; School of Software Engineering, South China University of Technology, Guangzhou, 510641, Guangdong, China.

Neural Networks : the Official Journal of the International Neural Network Society

|May 23, 2026

Summary

This summary is machine-generated.

This study introduces a Multi-Agent dialogue Policy Learning (MAPL) approach for better dialogue systems. MAPL enhances credit assignment and collaboration, leading to improved task completion and dialogue success rates.

Keywords:

Deep reinforcement learning Dialogue policy learning Human-machine dialogue system Multi-agent reinforcement learning

Related Experiment Videos

Last Updated: May 25, 2026

Virtual Agent for Real-Time Motivational Interviewing by Integrating Adaptive Nonverbal Behavior and Language Models

Virtual Agent for Real-Time Motivational Interviewing by Integrating Adaptive Nonverbal Behavior and Language Models

Published on: December 23, 2025

Area of Science:

Artificial Intelligence
Machine Learning
Natural Language Processing

Background:

Multi-agent reinforcement learning is crucial for cooperative dialogue policy modeling.
Existing methods struggle with error propagation and adaptive collaboration.
Effective credit assignment and balanced agent relationships are key challenges.

Purpose of the Study:

To propose a centralized Multi-Agent dialogue Policy Learning (MAPL) approach.
To enhance credit assignment and enable adaptive collaboration in dialogue systems.
To improve the efficiency and success rate of dialogue policies.

Main Methods:

MAPL utilizes multiple auxiliary agents and a main agent for policy learning.
Auxiliary agents update Q-values and assign user intent-level credits.
A balancing parameter dynamically adjusts intent credibility and policy behavior.

Main Results:

MAPL demonstrates more efficient policy learning capabilities.
Experiments show a higher dialogue success rate across three datasets.
Ablation studies confirm the positive impact of agent number and configuration.

Conclusions:

The proposed MAPL approach effectively addresses limitations in existing multi-agent dialogue systems.
MAPL offers a practical and efficient method for enhancing dialogue policy performance.
The approach shows significant improvements in task completion and overall dialogue success.