Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Observational Learning01:12

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning because...
Reinforcement01:23

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:
Purposive Learning01:22

Purposive Learning

E. C. Tolman emphasized the purposiveness of behavior — the idea that much of our behavior is goal-directed. For instance, employees who aim for a promotion work diligently to meet their targets. Tolman argued that when classical conditioning and operant conditioning occur, the organism acquires certain expectations. In classical conditioning, a child might fear a dog because they expect it to bite. In operant conditioning, a person might consistently work overtime because they expect a bonus...
Reinforcement Schedules01:24

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...
Elaborative Rehearsals01:07

Elaborative Rehearsals

Elaborative rehearsal is a crucial cognitive strategy that strengthens information encoding in long-term memory by making meaningful connections between new data and pre-existing knowledge. This approach contrasts with maintenance rehearsal, which involves simple repetition without delving into the significance of the information. While maintenance rehearsal might temporarily keep information active in short-term memory, it is less effective for long-term retention.
The effectiveness of...
Automatic Processing and Automatic Social Behavior01:28

Automatic Processing and Automatic Social Behavior

Automatic processing refers to the cognitive operations that occur without conscious intent or awareness, playing a fundamental role in shaping social cognition and behavior. These processes enable individuals to navigate complex social environments efficiently by relying on mental shortcuts and pre-existing knowledge structures known as schemas. One of the most influential mechanisms underlying automatic processing is priming, which subtly activates mental representations through exposure to...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Density Distillation for Fast Nonparametric Density Estimation.

IEEE transactions on neural networks and learning systems·2022
Same author

VERAM: View-Enhanced Recurrent Attention Model for 3D Shape Classification.

IEEE transactions on visualization and computer graphics·2018
Same author

T<sub>1</sub>-T<sub>2</sub> molecular magnetic resonance imaging of renal carcinoma cells based on nano-contrast agents.

International journal of nanomedicine·2018
Same author

Learning Discriminative 3D Shape Representations by View Discerning Networks.

IEEE transactions on visualization and computer graphics·2018
Same author

Polygalacic acid inhibits MMPs expression and osteoarthritis via Wnt/β-catenin and MAPK signal pathways suppression.

International immunopharmacology·2018
Same author

Synthesis of thioether andrographolide derivatives and their inhibitory effect against cancer cells.

MedChemComm·2018
Same journal

Exploiting audio-visual modalities in videos: Object detection via multi-stage bilateral coupling network.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Reliability-aware modality completion with cross-modal distillation for federated learning with missing modalities.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

IGFD-Net: Illumination-guided frequency decoupling for polarization image fusion.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Multiple-Strategies dung beetle optimizer and its applications in engineering optimization and bankruptcy prediction.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Aggregating global-scale pixel-wise forgery cues within a graph.

Neural networks : the official journal of the International Neural Network Society·2026
Same journal

Finite-Time intermittent control for secure synchronization of Neutral-Type stochastic delayed neural networks under aperiodic DoS attacks.

Neural networks : the official journal of the International Neural Network Society·2026
See all related articles

Related Experiment Video

Updated: May 25, 2026

Virtual Agent for Real-Time Motivational Interviewing by Integrating Adaptive Nonverbal Behavior and Language Models
07:14

Virtual Agent for Real-Time Motivational Interviewing by Integrating Adaptive Nonverbal Behavior and Language Models

Published on: December 23, 2025

Collective reflection-based multi-agent reinforcement learning framework for task-oriented dialogue policy learning.

Kai Xu1, Zhenyu Wang2, Yangyang Zhao3

  • 1Guangdong Provincial Key Laboratory of Intellectual Property and Big Data, Guangdong Polytechnic Normal University, Guangzhou, 510665, Guangdong, China; School of Software Engineering, South China University of Technology, Guangzhou, 510641, Guangdong, China.

Neural Networks : the Official Journal of the International Neural Network Society
|May 23, 2026
PubMed
Summary
This summary is machine-generated.

This study introduces a Multi-Agent dialogue Policy Learning (MAPL) approach for better dialogue systems. MAPL enhances credit assignment and collaboration, leading to improved task completion and dialogue success rates.

Keywords:
Deep reinforcement learningDialogue policy learningHuman-machine dialogue systemMulti-agent reinforcement learning

Related Experiment Videos

Last Updated: May 25, 2026

Virtual Agent for Real-Time Motivational Interviewing by Integrating Adaptive Nonverbal Behavior and Language Models
07:14

Virtual Agent for Real-Time Motivational Interviewing by Integrating Adaptive Nonverbal Behavior and Language Models

Published on: December 23, 2025

Area of Science:

  • Artificial Intelligence
  • Machine Learning
  • Natural Language Processing

Background:

  • Multi-agent reinforcement learning is crucial for cooperative dialogue policy modeling.
  • Existing methods struggle with error propagation and adaptive collaboration.
  • Effective credit assignment and balanced agent relationships are key challenges.

Purpose of the Study:

  • To propose a centralized Multi-Agent dialogue Policy Learning (MAPL) approach.
  • To enhance credit assignment and enable adaptive collaboration in dialogue systems.
  • To improve the efficiency and success rate of dialogue policies.

Main Methods:

  • MAPL utilizes multiple auxiliary agents and a main agent for policy learning.
  • Auxiliary agents update Q-values and assign user intent-level credits.
  • A balancing parameter dynamically adjusts intent credibility and policy behavior.

Main Results:

  • MAPL demonstrates more efficient policy learning capabilities.
  • Experiments show a higher dialogue success rate across three datasets.
  • Ablation studies confirm the positive impact of agent number and configuration.

Conclusions:

  • The proposed MAPL approach effectively addresses limitations in existing multi-agent dialogue systems.
  • MAPL offers a practical and efficient method for enhancing dialogue policy performance.
  • The approach shows significant improvements in task completion and overall dialogue success.