How Does Reinforcement Learning Work In Nlp