Building Vision Language Models On Price