Decomposing And Interpreting Image Representations Via Text In Vits Beyond Clip