Apertus-8B-Instruct-2509-SPINQUANT-FP8_dynamic
This is an FP8 dynamically quantized version of swiss-ai/Apertus-8B-Instruct-2509 using llm-compressor.
It additionally uses the SpinQuant transformation to improve quantization performance.
Quantization Details
- Quantization Scheme: FP8_dynamic
- Transformation: SpinQuant with Hadamard rotations (R1 & R2)
- Method: Dynamic quantization of weights and activations to FP8 format
- Targets: All Linear layers
- Ignored Layers:
lm_head(kept in higher precision for better output quality) - Tool: llm-compressor
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for sevri/Apertus-8B-Instruct-2509-SPINQUANT-FP8_dynamic
Base model
swiss-ai/Apertus-8B-2509
Finetuned
swiss-ai/Apertus-8B-Instruct-2509