Behavior Policy Reinforcement Learning From Human