GriDiT: Factorized Grid-Based Diffusion for Efficient Long Image Sequence Generation

Snehal Singh Tomar . Alexandros Graikos . A. Krishna . Dimitris Samaras . Klaus Mueller

Transactions on Machine Learning Research (TMLR) 2026

Stony Brook University

TL;DR: State-of-the-Art image sequence generation models treat image sequences as large tensors of ordered frames. In contrast, our method factorizes image sequence generation into two stages. First, we learn to model the dynamics of the sequence at low resolution, treating the frames as subsampled image grids. Second, we learn to super-resolve individual frames at high resolution. Using the DiT’s self-attention mechanism to model dynamics across frames, and paired with our sampling strategy, our method yields superior synthesis quality for sequences of arbitrary length while significantly reducing sampling time and training data requirements.

Code and Execution Details

Citation

Please cite our work as:

@article{
tomar2026gridit,
title={GriDiT: Factorized Grid-Based Diffusion for Efficient Long Image Sequence Generation},
author={Snehal Singh Tomar and Alexandros Graikos and Arjun Krishna and Dimitris Samaras and Klaus Mueller},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2026},
url={https://openreview.net/forum?id=QLD47Ou5lp},
note={}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for snehalstomar/GriDiT

Base model

facebook/DiT-XL-2-256

Finetuned

(2)

this model

Datasets used to train snehalstomar/GriDiT

Paper for snehalstomar/GriDiT

GriDiT: Factorized Grid-Based Diffusion for Efficient Long Image Sequence Generation

Paper • 2512.21276 • Published Dec 24, 2025 • 1