Proximal Policy Optimization

This is part of a broader topic of policy gradient algorithms.

PPO seems to be a win because it’s less complex to operate than other models. It performs similarly to ACER or TRPO, but is much easier to tune and seems to have broader applicability. Generally, it seems to be about improving the model, but not so much as you throw it all out of whack.

I don’t really understand much here.

Sources

https://openai.com/blog/openai-baselines-ppo/ https://towardsdatascience.com/proximal-policy-optimization-ppo-explained-abed1952457b

The notes of Justin Abrahms

Recently updated

tests for quartz

Zero Knowledge Proofs (ZKP)

Sprint Ceremony input/outputs

Explorer

Proximal Policy Optimization

Sources

Graph View

Backlinks