Snake model trained with Proximal Policy Optimization