OpenWebText BPE Collection BPE tokenizers with vocab sizes between 1k and 131k trained on OpenWebText, as well as the pre-tokenized dataset for each of them. • 16 items • Updated 5 days ago