Apertus-8B-Instruct-2509-SPINQUANT-FP8_dynamic

This is an FP8 dynamically quantized version of swiss-ai/Apertus-8B-Instruct-2509 using llm-compressor.

It additionally uses the SpinQuant transformation to improve quantization performance.

Quantization Details

Quantization Scheme: FP8_dynamic
Transformation: SpinQuant with Hadamard rotations (R1 & R2)
Method: Dynamic quantization of weights and activations to FP8 format
Targets: All Linear layers
Ignored Layers: lm_head (kept in higher precision for better output quality)
Tool: llm-compressor

Safetensors

Model size

8B params

Tensor type

BF16

F8_E4M3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Finetuned

Quantized

(28)

this model