How Does An Llm Generate Text From Audio