indiejoseph/wikipedia-zh-yue-filtered
Viewer • Updated • 133k • 26.4k • 4
How to use indiejoseph/bart-base-cantonese with Transformers:
# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("indiejoseph/bart-base-cantonese")
model = AutoModelForSeq2SeqLM.from_pretrained("indiejoseph/bart-base-cantonese")This model is a continue pre-train version of fnlp/bart-base-chinese on filtered Cantonese common crawl dataset with 472M tokens.
This tokenizer has extended the Bert tokenizer from fnlp/bart-base-chinese with 100 more Chinese characters commonly found in Cantonese
More information needed
More information needed
The following hyperparameters were used during training: