Observational Learning
Multi-Step Reactions
Reinforcement
Reinforcements in Concrete
Corrosion of Reinforcement
Reinforcement Schedules
You might also read
Articles linked to this work by shared authors, journal, and citation graph.
Updated: Feb 9, 2026

A Step-by-Step Implementation of DeepBehavior, Deep Learning Toolbox for Automated Behavior Analysis
Published on: February 6, 2020
Lingheng Meng1, Rob Gorbet2, Michael Burke3
1Electrical and Computer Engineering, University of Waterloo, 200 University Avenue West, Waterloo, N2L 3G1, ON, Canada; Electrical and Computer Systems Engineering, Monash University, 18 Alliance Lane, Clayton, 3800, VIC, Australia; Data61, CSIRO, Research Way, Calyton, 3168, VIC, Australia.
Proximal Policy Optimization (PPO) shows greater robustness in partially observable environments compared to Twin Delayed Deep Deterministic Policy Gradient (TD3) and Soft Actor-Critic (SAC). Multi-step bootstrapping in PPO and adaptations to TD3/SAC enhance performance in these challenging settings.
Area of Science:
Background:
Purpose of the Study:
Main Methods:
Main Results:
Conclusions: