DZ-TDPO: Non-Destructive Temporal Alignment for Mutable State Tracking in Long-Context Dialogue
Paper
•
2512.03704
•
Published
•
2
Research Preview checkpoint for the paper DZ-TDPO.
This model corresponds to the Scaling Analysis (Section 4.3) of our paper.
Due to the strong 'Parametric Inertia' of larger models, this checkpoint prioritizes Language Stability (Low PPL) over aggressive state updates.
We release this model to facilitate research into the Capacity-Stability Trade-off in long-context alignment.
🚀 For maximum plasticity and SOTA conflict resolution (55.4% Win Rate), please use our flagship model: DZ-TDPO-Phi-3.5-mini-instruct.