DZ-TDPO (Qwen2.5-7B)

Research Preview checkpoint for the paper DZ-TDPO.

⚠️ Research Note

This model corresponds to the Scaling Analysis (Section 4.3) of our paper.

Due to the strong 'Parametric Inertia' of larger models, this checkpoint prioritizes Language Stability (Low PPL) over aggressive state updates.

Win Rate: 50.8% (MSC Dataset)
Alignment Tax: Negligible (+1.95 PPL)

We release this model to facilitate research into the Capacity-Stability Trade-off in long-context alignment.

🚀 For maximum plasticity and SOTA conflict resolution (55.4% Win Rate), please use our flagship model: DZ-TDPO-Phi-3.5-mini-instruct.

Downloads last month: 2

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for YijunLiao/DZ-TDPO-Qwen2.5-7B-Instruct

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

(2500)

this model

Paper for YijunLiao/DZ-TDPO-Qwen2.5-7B-Instruct

DZ-TDPO: Non-Destructive Temporal Alignment for Mutable State Tracking in Long-Context Dialogue

Paper • 2512.03704 • Published Dec 3, 2025 • 2