How Does Attention Work In Neural Networks