How Do Large Language Models Generate Text To Video