Bandit Problem Reinforcement Learning Tutorial