Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Associative Learning01:27

Associative Learning

Associative learning is a fundamental concept in behavioral psychology, wherein a connection is established between two stimuli or events, leading to a learned response. This process is critical in understanding how behaviors are acquired and modified. Conditioning, the mechanism through which associations are formed, can be divided into two main types: classical conditioning and operant conditioning, each elucidating different aspects of associative learning.
Classical conditioning, also known...
Observational Learning01:12

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning because...
Multi-input and Multi-variable systems01:22

Multi-input and Multi-variable systems

Cruise control systems in cars are designed as multi-input systems to maintain a driver's desired speed while compensating for external disturbances such as changes in terrain. The block diagram for a cruise control system typically includes two main inputs: the desired speed set by the driver and any external disturbances, such as the incline of the road. By adjusting the engine throttle, the system maintains the vehicle's speed as close to the desired value as possible.
In the absence of...
Collisions in Multiple Dimensions: Problem Solving01:06

Collisions in Multiple Dimensions: Problem Solving

In multiple dimensions, the conservation of momentum applies in each direction independently. Hence, to solve collisions in multiple dimensions, we should write down the momentum conservation in each direction separately. To help understand collisions in multiple dimensions, consider an example.
A small car of mass 1,200 kg traveling east at 60 km/h collides at an intersection with a truck of mass 3,000 kg traveling due north at 40 km/h. The two vehicles are locked together. What is the...
Cognitive Learning01:21

Cognitive Learning

Cognitive learning is based on purposive behavior, incidental learning, and insight learning.
E. C. Tolman's theory of purposive behavior emphasizes that much behavior is goal-directed. He argued that to understand behavior, we must look at the entire sequence of actions leading to a goal. For instance, high school students study hard, not just due to past reinforcement but also to achieve the goal of getting into a good college.
Tolman introduced the idea that behavior is influenced by...
Reinforcement01:23

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Genome-wide analysis of <i>HSP70</i> gene superfamily in kelp (<i>Saccharina japonica</i>): identification, characterization, and heat stress-responsive expression profiles.

PeerJ·2026
Same author

Prevalence and Predictors of Cancer-Related Fatigue in Breast Cancer-Related Lymphedema Patients: A Cross-Sectional Study.

Lymphatic research and biology·2026
Same author

Protonated alkynyl-linked bipolar conjugated microporous polymers as high-rate-capacity Lithium-ion battery cathodes.

Journal of colloid and interface science·2026
Same author

Distance-adaptive geometric margins for residual rotational uncertainty in single-isocenter multitarget stereotactic radiosurgery.

Physics and imaging in radiation oncology·2026
Same author

Climatic drivers and niche dynamics: Modeling the current and future habitats of <i>Prionailurus</i> <i>bengalensis</i> in China.

iScience·2026
Same author

Beauveria caledonica as an Antagonist Controlling Fusarium oxysporum f. sp. cubense Tropical Race 4 in Bananas.

Current microbiology·2026
Same journal

Research on a Regional Availability Evaluation Model for Road-Area High-Entropy Energy Based on Synergy Factors.

Entropy (Basel, Switzerland)·2026
Same journal

Atmospheric Turbulence Channel Modeling and Performance Analysis of a CO-ZP-OFDM Coherent Optical Communication System for UAV Air-to-Ground Scenarios.

Entropy (Basel, Switzerland)·2026
Same journal

Information Geometry and Asymptotic Theory for SMML Estimators.

Entropy (Basel, Switzerland)·2026
Same journal

Correlation Entropy and Power-Law Kinetics.

Entropy (Basel, Switzerland)·2026
Same journal

Research on the Contagion of Systemic Financial Risk Under the Impact of Climate Risks-From the Perspective of Complex Networks and Machine Learning.

Entropy (Basel, Switzerland)·2026
Same journal

The Statistical-Mechanical Meaning of the Wave Function of Quantum Mechanics.

Entropy (Basel, Switzerland)·2026
See all related articles

Related Experiment Videos

LLM-Augmented Multi-Agent Reinforcement Learning for Cross-Scenario Knowledge Transfer.

Chao Li1, Yanfei Liu1, Jieling Wang1

  • 1Department of Basic Courses, Rocket Force University of Engineering, Xi'an 710025, China.

Entropy (Basel, Switzerland)
|May 26, 2026
PubMed
Summary
This summary is machine-generated.

This study introduces LoLM-MARL, a novel method for multi-agent reinforcement learning (MARL) that uses large language models (LLMs) to improve policy transfer efficiency. LoLM-MARL significantly enhances learning speed and generalization in complex collaborative tasks.

Keywords:
annealing Kullback–Leibler divergencedynamic promptknowledge transferlarge language modelslow-rank adaptationmulti-agent reinforcement learning

Related Experiment Videos

Area of Science:

  • Artificial Intelligence
  • Machine Learning
  • Robotics

Background:

  • Multi-agent reinforcement learning (MARL) faces challenges with low sample efficiency due to extensive trial-and-error interactions.
  • Real-world applications of MARL are limited by the high cost of data acquisition and training.
  • Accelerating policy convergence and improving transferability are critical for advancing MARL.

Purpose of the Study:

  • To propose LoLM-MARL, a novel MARL policy transfer method leveraging large language models (LLMs).
  • To enhance sample efficiency and accelerate convergence in multi-agent collaborative tasks.
  • To enable effective cross-scenario policy transfer in complex dynamic environments.

Main Methods:

  • Fine-tuning pre-trained LLMs using low-rank adaptation (LoRA) to imbue general decision-making knowledge.
  • Designing a dynamic prompt construction method that optimizes agent state information for LLMs.
  • Implementing Kullback-Leibler (KL) divergence regularization with an annealing strategy to prevent catastrophic forgetting.

Main Results:

  • LoLM-MARL achieved up to a 101.4% improvement in average win rate on zero-shot transfer tasks compared to state-of-the-art (SOTA) methods.
  • Consistent improvements in generalization performance were observed across six few-shot transfer tasks.
  • Convergence speed increased by 4 to 30 times compared to training from scratch.

Conclusions:

  • LoLM-MARL offers a new paradigm for efficient policy transfer in MARL by utilizing LLM capabilities.
  • The method significantly improves learning efficiency and generalization in complex multi-agent systems.
  • Dynamic prompt design and KL regularization are key components for successful LLM-based MARL policy transfer.