Revisiting Natural Gradient For Deep Networks