Optimal Value Function Reinforcement Learning