Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Survival Tree01:19

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a survival tree begins...
Reinforcement Schedules01:24

Reinforcement Schedules

Positive reinforcement is a powerful method for teaching new behaviors to both animals and humans. B.F. Skinner demonstrated this with his experiments using rats in a Skinner box. When a rat pressed a lever, it received a food pellet. This immediate reward encouraged the rat to repeat the behavior. This method, where a reward follows every instance of the behavior, is known as continuous reinforcement. It is highly effective for establishing new behaviors quickly.
Once a behavior is learned,...
Observational Learning01:12

Observational Learning

Albert Bandura's observational learning, also known as imitation or modeling, occurs when a person observes and imitates another's behavior. It is a quicker process than operant conditioning. A well-known example is the Bobo doll study, where children who saw an adult acting aggressively towards the doll were more likely to act aggressively when left alone, compared to those who observed a nonaggressive adult. Many psychologists view observational learning as a form of latent learning because...
Language Development01:22

Language Development

Children master language quickly and with relative ease, supported by both biological predisposition and reinforcement. B. F. Skinner (1957) proposed that language is learned through reinforcement, while Noam Chomsky (1965) argued that language acquisition mechanisms are biologically determined.
The critical period for language acquisition suggests that the ability to acquire language is at its peak early in life. As people age, this proficiency decreases. Language development begins very...
Reinforcement01:23

Reinforcement

Positive and negative reinforcement are key concepts in operant conditioning, a learning process where the consequences of a behavior affect the likelihood of that behavior being repeated.
Positive reinforcement occurs when a behavior is followed by the presentation of a rewarding stimulus, increasing the frequency of that behavior. For example:
Language and Cognition01:27

Language and Cognition

Language serves as a bridge between ideas and communication, influencing how individuals perceive and interact with the world. Psychologists have long debated whether language shapes thought or vice versa. This discussion gained grip with Edward Sapir and Benjamin Lee Whorf in the 1940s, who proposed that language determines thought, a concept known as linguistic determinism. They suggested that the vocabulary and structure of a language influence how its speakers think and perceive reality.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models.

Proceedings of the conference. Association for Computational Linguistics. Meeting·2026
Same author

Contemporary Analysis of Diagnosis, Treatment, and Outcomes in Pisiform Fractures at a Level 1 Trauma Center.

Hand (New York, N.Y.)·2026
Same author

README: Bridging Medical Jargon and Lay Understanding for Patient Education through Data-Centric NLP.

Findings of ACL. EMNLP. Conference on Empirical Methods in Natural Language Processing·2026
Same author

LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models.

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing·2026
Same author

Chatbot To Help Patients Understand Their Health.

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing·2025
Same author

Comprehensive Overview of Computational Tools for Alternative Splicing Analysis.

Wiley interdisciplinary reviews. RNA·2025
Same journal

RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs in Medicine.

Findings of ACL. ACL·2026
Same journal

NoteChat: A Dataset of Synthetic Patient-Physician Conversations Conditioned on Clinical Notes.

Findings of ACL. ACL·2026
Same journal

Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models.

Findings of ACL. ACL·2026
Same journal

Knowledge-Driven Cross-Document Relation Extraction.

Findings of ACL. ACL·2026
Same journal

Dual Debiasing for Noisy In-Context Learning for Text Generation.

Findings of ACL. ACL·2026
Same journal

Behavioral Analysis of Information Salience in Large Language Models.

Findings of ACL. ACL·2026
See all related articles

Related Experiment Video

Updated: Jul 4, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Exploiting Tree Structure for Credit Assignment in Reinforcement Learning with Large Language Models.

Hieu Tran1,2, Zonghai Yao1,2, Hong Yu1,2,3

  • 1Center for Healthcare Organization and Implementation Research, VA Bedford Health Care.

Findings of ACL. ACL
|July 3, 2026
PubMed
Summary
This summary is machine-generated.

Reinforcement learning for large language models (LLMs) is improved by TEMPO, a novel critic-free algorithm. TEMPO enhances policy optimization by using a prefix tree to better assign credit for token-level rewards, boosting model performance.

Related Experiment Videos

Last Updated: Jul 4, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Area of Science:

  • Artificial Intelligence
  • Machine Learning
  • Natural Language Processing

Background:

  • Reinforcement learning (RL) enhances large language model (LLM) reasoning.
  • Token-level credit assignment is challenging due to sparse, delayed rewards in LLMs.
  • Existing methods like PPO (actor-critic) and GRPO (critic-free) have limitations in complexity or reward distribution.

Purpose of the Study:

  • To address the challenge of token-level credit assignment in RL for LLMs.
  • To develop a critic-free RL algorithm that improves upon existing methods.
  • To enhance the reasoning abilities and performance of LLMs through more effective reward propagation.

Main Methods:

  • Proposed Prefix-to-Tree (P2T) to organize sampled responses into a prefix tree.
  • Developed TEMPO (Tree-Estimated Mean Prefix Value for Policy Optimization), a critic-free algorithm.
  • Integrated branch-aware temporal-difference (TD) corrections into GRPO using P2T-derived prefix values.

Main Results:

  • TEMPO consistently improved convergence and final performance over PPO and GRPO on Qwen3 models.
  • Performance gains were observed on both in-distribution (MATH, MedQA) and out-of-distribution benchmarks (GSM-HARD, AMC23, MedMCQA, MMLU-Medical).
  • TEMPO achieved higher validation accuracy within comparable wall-clock time.

Conclusions:

  • TEMPO offers an effective critic-free approach for RL in LLMs, overcoming limitations of previous methods.
  • The prefix tree structure and TD corrections enable more nuanced credit assignment.
  • TEMPO demonstrates significant improvements in LLM reasoning and performance across diverse tasks.