Apertus-8B-Instruct-2509-SPINQUANT-FP8_dynamic

This is an FP8 dynamically quantized version of swiss-ai/Apertus-8B-Instruct-2509 using llm-compressor.

It additionally uses the SpinQuant transformation to improve quantization performance.

Quantization Details

  • Quantization Scheme: FP8_dynamic
  • Transformation: SpinQuant with Hadamard rotations (R1 & R2)
  • Method: Dynamic quantization of weights and activations to FP8 format
  • Targets: All Linear layers
  • Ignored Layers: lm_head (kept in higher precision for better output quality)
  • Tool: llm-compressor
Downloads last month
1
Safetensors
Model size
8B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sevri/Apertus-8B-Instruct-2509-SPINQUANT-FP8_dynamic

Quantized
(28)
this model