Video Prediction By Efficient Transformers