Fullstop Punctuation Multilingual (GGUF)

GGUF conversion of oliverguhr/fullstop-punctuation-multilang-large for use with CrispASR.

Adds punctuation to unpunctuated ASR output. Multilingual (English, German, French, Italian) with ASCII punctuation output.

Model Details

  • Architecture: XLM-RoBERTa-large โ€” 24L, d=1024, 16 heads, d_ffn=4096, GELU
  • Parameters: ~560M
  • Classifier: Linear(1024, 6) โ€” 6 punctuation classes
  • Labels: none, . (period), , (comma), ? (question), - (dash), : (colon)
  • Vocabulary: SentencePiece (250,002 tokens)
  • Max sequence: 512 tokens (auto-chunked)
  • Languages: English, German, French, Italian
  • License: MIT

Usage with CrispASR

crispasr --backend wav2vec2 -m wav2vec2.gguf --punc-model fullstop-punc-q4_k.gguf -f audio.wav

Available Files

File Quant Size Description
fullstop-punc-f32emb.gguf F16+F32emb 1.6 GB Full precision (F32 embeddings)
fullstop-punc-q8_0.gguf Q8_0 572 MB High quality
fullstop-punc-q4_k.gguf Q4_K 254 MB Recommended

Example

Input: and so my fellow americans ask not what your country can do for you ask what you can do for your country

Output: And so my fellow americans ask not what your country can do for you, ask what you can do for your country.

Conversion

python models/convert-fullstop-punc-to-gguf.py \
  --input oliverguhr/fullstop-punctuation-multilang-large \
  --output fullstop-punc.gguf

Original Model

Downloads last month
-
GGUF
Model size
0.6B params
Architecture
fireredpunc
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for cstr/fullstop-punc-multilang-GGUF

Quantized
(2)
this model