Reinforcement Learning Pytorch Tic Tac Toe