When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models
Paper • 2604.08546 • Published • 109
None defined yet.
When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models
Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models