Compositional Visual Generation From Text