YAML Metadata Warning: The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

T5 Grammar Correction - CoreML

CoreML conversion of flexudy/t5-small-wav2vec2-grammar-fixer for on-device iOS grammar correction.

Model Description

This model is optimized for correcting speech-to-text transcription output:

  • Fixes grammar and spelling errors
  • Adds proper punctuation and capitalization
  • Removes filler words (um, uh, etc.)
  • Works well with spoken English

Files

File Description
encoder.mlmodelc/ Compiled T5 encoder model
decoder.mlmodelc/ Compiled T5 decoder model
vocab.json SentencePiece vocabulary (32,000 tokens)

Usage

Model Specifications

  • Input Length: Up to 128 tokens (padded)
  • Output Length: Up to 64 tokens (generated autoregressively)
  • Vocab Size: 32,000 tokens
  • Model Size: ~60MB total

Compute Units

Important: This model must run on CPU only due to architectural constraints:

Compute Unit Status Reason
CPU Works Full compatibility
GPU (Metal) Not available Metal not accessible from background apps
Neural Engine (ANE) Not compatible T5 relative position bias requires dynamic shape computation

The model runs on CPU in ~1-3 seconds per correction on modern iPhones (A12+).

Swift Example

import CoreML

// Load models with CPU-only configuration
let config = MLModelConfiguration()
config.computeUnits = .cpuOnly

let encoder = try MLModel(contentsOf: encoderURL, configuration: config)
let decoder = try MLModel(contentsOf: decoderURL, configuration: config)

// Encoder inference
let encoderInput = try MLDictionaryFeatureProvider(dictionary: [
    "input_ids": MLFeatureValue(multiArray: inputIdsArray),
    "attention_mask": MLFeatureValue(multiArray: attentionMaskArray)
])
let encoderOutput = try encoder.prediction(from: encoderInput)
let hiddenStates = encoderOutput.featureValue(for: "last_hidden_state")?.multiArrayValue

// Decoder inference (autoregressive)
var generatedIds = [0]  // Start with PAD token
for _ in 0..<64 {
    let decoderInput = try MLDictionaryFeatureProvider(dictionary: [
        "decoder_input_ids": MLFeatureValue(multiArray: decoderIdsArray),
        "encoder_hidden_states": MLFeatureValue(multiArray: hiddenStates!),
        "encoder_attention_mask": MLFeatureValue(multiArray: attentionMaskArray)
    ])
    let logits = try decoder.prediction(from: decoderInput)
    let nextToken = argmax(logits)
    if nextToken == 1 { break }  // EOS token
    generatedIds.append(nextToken)
}

Technical Details

Why CPU Only?

T5 uses relative position biases that require dynamic shape computation at inference time. When converting to CoreML:

  1. GPU (Metal): The main app that hosts the CoreML model runs in the background when a keyboard extension triggers inference. iOS does not allow background apps to submit GPU work.

  2. Neural Engine: Requires models with fixed input shapes or enumerated shape options. T5 relative position bias computes indices dynamically based on sequence length, which cannot be traced as static operations.

Performance Tips

  • Pad inputs to fixed lengths (128 for encoder, 64 for decoder) even though CPU can handle dynamic shapes
  • Use greedy decoding (argmax) for fastest inference
  • Skip very short texts (< 3 characters) - they do not need correction
  • For better performance on newer devices, consider Apple Foundation Models (iOS 26+, A17 Pro+)

Attribution

This CoreML model is a conversion of:

License

Apache 2.0 - Same as the original T5 model.

Citation

@article{raffel2020exploring,
  title={Exploring the limits of transfer learning with a unified text-to-text transformer},
  author={Raffel, Colin and Shazeer, Noam and Roberts, Adam and Lee, Katherine and Narang, Sharan and Matena, Michael and Zhou, Yanqi and Li, Wei and Liu, Peter J},
  journal={Journal of Machine Learning Research},
  volume={21},
  pages={1--67},
  year={2020}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for goodpixelltd/t5-grammar-coreml

Finetuned
(1)
this model

Paper for goodpixelltd/t5-grammar-coreml