Vision Language Models Explained