DZ-TDPO (Qwen2.5-7B)

Research Preview checkpoint for the paper DZ-TDPO.

⚠️ Research Note

This model corresponds to the Scaling Analysis (Section 4.3) of our paper.

Due to the strong 'Parametric Inertia' of larger models, this checkpoint prioritizes Language Stability (Low PPL) over aggressive state updates.

  • Win Rate: 50.8% (MSC Dataset)
  • Alignment Tax: Negligible (+1.95 PPL)

We release this model to facilitate research into the Capacity-Stability Trade-off in long-context alignment.

🚀 For maximum plasticity and SOTA conflict resolution (55.4% Win Rate), please use our flagship model: DZ-TDPO-Phi-3.5-mini-instruct.

Downloads last month
2
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for YijunLiao/DZ-TDPO-Qwen2.5-7B-Instruct

Base model

Qwen/Qwen2.5-7B
Finetuned
(2500)
this model

Paper for YijunLiao/DZ-TDPO-Qwen2.5-7B-Instruct