How Does An Llm Generate Text To Voice