This is part of a broader topic of policy gradient algorithms.

PPO seems to be a win because it’s less complex to operate than other models. It performs similarly to ACER or TRPO, but is much easier to tune and seems to have broader applicability. Generally, it seems to be about improving the model, but not so much as you throw it all out of whack.

I don’t really understand much here.

Sources

https://openai.com/blog/openai-baselines-ppo/ https://towardsdatascience.com/proximal-policy-optimization-ppo-explained-abed1952457b