Reinforcement
Reinforcement Schedules
Observational Learning
Decision Making: P-value Method
Associative Learning
Law of Effect
You might also read
Articles linked to this work by shared authors, journal, and citation graph.
Intrinsic Value-Aligned Policy Optimization (IVPO) enhances offline-to-online reinforcement learning by balancing optimism and pessimism in Q-value estimation. This novel approach mitigates performance drops during online finetuning, achieving state-of-the-art results.
Area of Science:
Background:
Purpose of the Study:
Main Methods:
Main Results:
Conclusions: