Vision Language Models Survey