T5 Grammar Correction - CoreML
CoreML conversion of flexudy/t5-small-wav2vec2-grammar-fixer for on-device iOS grammar correction.
Model Description
This model is optimized for correcting speech-to-text transcription output:
- Fixes grammar and spelling errors
- Adds proper punctuation and capitalization
- Removes filler words (um, uh, etc.)
- Works well with spoken English
Files
| File | Description |
|---|---|
encoder.mlmodelc/ |
Compiled T5 encoder model |
decoder.mlmodelc/ |
Compiled T5 decoder model |
vocab.json |
SentencePiece vocabulary (32,000 tokens) |
Usage
Model Specifications
- Input Length: Up to 128 tokens (padded)
- Output Length: Up to 64 tokens (generated autoregressively)
- Vocab Size: 32,000 tokens
- Model Size: ~60MB total
Compute Units
Important: This model must run on CPU only due to architectural constraints:
| Compute Unit | Status | Reason |
|---|---|---|
| CPU | Works | Full compatibility |
| GPU (Metal) | Not available | Metal not accessible from background apps |
| Neural Engine (ANE) | Not compatible | T5 relative position bias requires dynamic shape computation |
The model runs on CPU in ~1-3 seconds per correction on modern iPhones (A12+).
Swift Example
import CoreML
// Load models with CPU-only configuration
let config = MLModelConfiguration()
config.computeUnits = .cpuOnly
let encoder = try MLModel(contentsOf: encoderURL, configuration: config)
let decoder = try MLModel(contentsOf: decoderURL, configuration: config)
// Encoder inference
let encoderInput = try MLDictionaryFeatureProvider(dictionary: [
"input_ids": MLFeatureValue(multiArray: inputIdsArray),
"attention_mask": MLFeatureValue(multiArray: attentionMaskArray)
])
let encoderOutput = try encoder.prediction(from: encoderInput)
let hiddenStates = encoderOutput.featureValue(for: "last_hidden_state")?.multiArrayValue
// Decoder inference (autoregressive)
var generatedIds = [0] // Start with PAD token
for _ in 0..<64 {
let decoderInput = try MLDictionaryFeatureProvider(dictionary: [
"decoder_input_ids": MLFeatureValue(multiArray: decoderIdsArray),
"encoder_hidden_states": MLFeatureValue(multiArray: hiddenStates!),
"encoder_attention_mask": MLFeatureValue(multiArray: attentionMaskArray)
])
let logits = try decoder.prediction(from: decoderInput)
let nextToken = argmax(logits)
if nextToken == 1 { break } // EOS token
generatedIds.append(nextToken)
}
Technical Details
Why CPU Only?
T5 uses relative position biases that require dynamic shape computation at inference time. When converting to CoreML:
GPU (Metal): The main app that hosts the CoreML model runs in the background when a keyboard extension triggers inference. iOS does not allow background apps to submit GPU work.
Neural Engine: Requires models with fixed input shapes or enumerated shape options. T5 relative position bias computes indices dynamically based on sequence length, which cannot be traced as static operations.
Performance Tips
- Pad inputs to fixed lengths (128 for encoder, 64 for decoder) even though CPU can handle dynamic shapes
- Use greedy decoding (argmax) for fastest inference
- Skip very short texts (< 3 characters) - they do not need correction
- For better performance on newer devices, consider Apple Foundation Models (iOS 26+, A17 Pro+)
Attribution
This CoreML model is a conversion of:
- Base Model: flexudy/t5-small-wav2vec2-grammar-fixer
- Original Architecture: T5 by Google Research
- Converted by: Good Pixel Ltd using coremltools
License
Apache 2.0 - Same as the original T5 model.
Citation
@article{raffel2020exploring,
title={Exploring the limits of transfer learning with a unified text-to-text transformer},
author={Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter J},
journal={Journal of Machine Learning Research},
volume={21},
pages={1--67},
year={2020}
}
Model tree for goodpixelltd/t5-grammar-coreml
Base model
flexudy/t5-small-wav2vec2-grammar-fixer