arxiv:2603.16669

Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation

Published on Mar 17

· Submitted by

Yukang Cao on Mar 18

MMLab@NTU

Upvote

Authors:

Mutian Xu ,

Tianbao Zhang ,

Zhaoxi Chen ,

Ziwei Liu

Abstract

Kinema4D introduces a 4D generative robotic simulator that models robot-world interactions through precise kinematic control and spatiotemporal environmental reaction synthesis, enabling physically plausible and embodiment-agnostic simulations with zero-shot transfer capability.

AI-generated summary

Simulating robot-world interactions is a cornerstone of Embodied AI. Recently, a few works have shown promise in leveraging video generations to transcend the rigid visual/physical constraints of traditional simulators. However, they primarily operate in 2D space or are guided by static environmental cues, ignoring the fundamental reality that robot-world interactions are inherently 4D spatiotemporal events that require precise interactive modeling. To restore this 4D essence while ensuring the precise robot control, we introduce Kinema4D, a new action-conditioned 4D generative robotic simulator that disentangles the robot-world interaction into: i) Precise 4D representation of robot controls: we drive a URDF-based 3D robot via kinematics, producing a precise 4D robot control trajectory. ii) Generative 4D modeling of environmental reactions: we project the 4D robot trajectory into a pointmap as a spatiotemporal visual signal, controlling the generative model to synthesize complex environments' reactive dynamics into synchronized RGB/pointmap sequences. To facilitate training, we curated a large-scale dataset called Robo4D-200k, comprising 201,426 robot interaction episodes with high-quality 4D annotations. Extensive experiments demonstrate that our method effectively simulates physically-plausible, geometry-consistent, and embodiment-agnostic interactions that faithfully mirror diverse real-world dynamics. For the first time, it shows potential zero-shot transfer capability, providing a high-fidelity foundation for advancing next-generation embodied simulation.

View arXiv page View PDF Project page GitHub 21 Add to collection

Community

Minoday

Paper author about 22 hours ago

•

edited about 21 hours ago

Code will be released in 1~2 weeks. https://github.com/mutianxu/Kinema4D

yukangcao

Paper submitter about 21 hours ago

Project page: https://mutianxu.github.io/Kinema4D-project-page/
Demo video: https://www.youtube.com/watch?v=9Z1fLIwuZdM

avahal

about 16 hours ago

the idea of tying exact 4d robot kinematics to a learned 4d world reaction is neat, i like the disentanglement of control from environment dynamics. but for fast, contact-rich maneuvers, projection drift and occlusions could desynchronize the 4d pointmap from the true motion, which might hurt geometry-consistency. did you test how sensitive the results are to the fidelity of the 4d projection, or to frame-rate differences between the kinematic trajectory and the generated sequence? btw the arxivlens breakdown helped me parse the method details, there's a solid walkthrough on arxivlens that covers this well: https://arxivlens.com/PaperView/Details/kinema4d-kinematic-4d-world-modeling-for-spatiotemporal-embodied-simulation-1744-9a570cd1