The Reinforcement Learning Problem