Optimal Policy Reinforcement Learning