When And Why Vision Language Models