What Is Spatial In Multimodal Learning