Reinforcement Learning Based On Mpc And The Stochastic Policy Gradient Method