- sample actions
- observe rewards
- tweak the policy
Sources
https://towardsdatascience.com/policy-gradients-in-reinforcement-learning-explained-ecec7df94245
https://towardsdatascience.com/policy-gradients-in-reinforcement-learning-explained-ecec7df94245