Reinforcement Learning Reward Decrease