Reinforcement
Diffusion
Observational Learning
Reinforcement Schedules
Instinctive Drift
Associative Learning
You might also read
Articles linked to this work by shared authors, journal, and citation graph.
Updated: Sep 14, 2025

Automated Visual Cognitive Tasks for Recording Neural Activity Using a Floor Projection Maze
Published on: February 20, 2014
Reward finetuning for diffusion models can overoptimize, harming performance. Constrained Diffusion Policy Optimization (CDPO) uses step-specific rewards, neuron resets, and auxiliary objectives to prevent this, improving model alignment and generalization.
Area of Science:
Background:
Purpose of the Study:
Main Methods:
Main Results:
Conclusions: