Optimizing Large Language Modeling From Scratch