sample actions observe rewards tweak the policy Sources https://towardsdatascience.com/policy-gradients-in-reinforcement-learning-explained-ecec7df94245