YAML Metadata Warning: The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

T5 Grammar Correction - CoreML

CoreML conversion of flexudy/t5-small-wav2vec2-grammar-fixer for on-device iOS grammar correction.

Model Description

This model is optimized for correcting speech-to-text transcription output:

Fixes grammar and spelling errors
Adds proper punctuation and capitalization
Removes filler words (um, uh, etc.)
Works well with spoken English

Files

File	Description
`encoder.mlmodelc/`	Compiled T5 encoder model
`decoder.mlmodelc/`	Compiled T5 decoder model
`vocab.json`	SentencePiece vocabulary (32,000 tokens)

Usage

Model Specifications

Input Length: Up to 128 tokens (padded)
Output Length: Up to 64 tokens (generated autoregressively)
Vocab Size: 32,000 tokens
Model Size: ~60MB total

Compute Units

Important: This model must run on CPU only due to architectural constraints:

Compute Unit	Status	Reason
CPU	Works	Full compatibility
GPU (Metal)	Not available	Metal not accessible from background apps
Neural Engine (ANE)	Not compatible	T5 relative position bias requires dynamic shape computation

The model runs on CPU in ~1-3 seconds per correction on modern iPhones (A12+).

Swift Example

import CoreML

// Load models with CPU-only configuration
let config = MLModelConfiguration()
config.computeUnits = .cpuOnly

let encoder = try MLModel(contentsOf: encoderURL, configuration: config)
let decoder = try MLModel(contentsOf: decoderURL, configuration: config)

// Encoder inference
let encoderInput = try MLDictionaryFeatureProvider(dictionary: [
    "input_ids": MLFeatureValue(multiArray: inputIdsArray),
    "attention_mask": MLFeatureValue(multiArray: attentionMaskArray)
])
let encoderOutput = try encoder.prediction(from: encoderInput)
let hiddenStates = encoderOutput.featureValue(for: "last_hidden_state")?.multiArrayValue

// Decoder inference (autoregressive)
var generatedIds = [0]  // Start with PAD token
for _ in 0..<64 {
    let decoderInput = try MLDictionaryFeatureProvider(dictionary: [
        "decoder_input_ids": MLFeatureValue(multiArray: decoderIdsArray),
        "encoder_hidden_states": MLFeatureValue(multiArray: hiddenStates!),
        "encoder_attention_mask": MLFeatureValue(multiArray: attentionMaskArray)
    ])
    let logits = try decoder.prediction(from: decoderInput)
    let nextToken = argmax(logits)
    if nextToken == 1 { break }  // EOS token
    generatedIds.append(nextToken)
}

Technical Details

Why CPU Only?

T5 uses relative position biases that require dynamic shape computation at inference time. When converting to CoreML:

GPU (Metal): The main app that hosts the CoreML model runs in the background when a keyboard extension triggers inference. iOS does not allow background apps to submit GPU work.
Neural Engine: Requires models with fixed input shapes or enumerated shape options. T5 relative position bias computes indices dynamically based on sequence length, which cannot be traced as static operations.

Performance Tips

Pad inputs to fixed lengths (128 for encoder, 64 for decoder) even though CPU can handle dynamic shapes
Use greedy decoding (argmax) for fastest inference
Skip very short texts (< 3 characters) - they do not need correction
For better performance on newer devices, consider Apple Foundation Models (iOS 26+, A17 Pro+)

Attribution

This CoreML model is a conversion of:

Base Model: flexudy/t5-small-wav2vec2-grammar-fixer
Original Architecture: T5 by Google Research
Converted by: Good Pixel Ltd using coremltools

License

Apache 2.0 - Same as the original T5 model.

Citation

@article{raffel2020exploring,
  title={Exploring the limits of transfer learning with a unified text-to-text transformer},
  author={Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter J},
  journal={Journal of Machine Learning Research},
  volume={21},
  pages={1--67},
  year={2020}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for goodpixelltd/t5-grammar-coreml

Base model

flexudy/t5-small-wav2vec2-grammar-fixer

Finetuned

(1)

this model

Paper for goodpixelltd/t5-grammar-coreml

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Paper • 1910.10683 • Published Oct 23, 2019 • 16