metadata
license: apple-ascl
tags:
- open-lm
- temporal
- tic-lm
- causal-lm
library_name: transformers
pipeline_tag: text-generation
Open LM 1B — Knowledge Cutoff January 2019
This is a HuggingFace-format conversion of the Apple Open LM 1B oracle model trained with a knowledge cutoff of January 2019, from the TiC-LM (Time-Continual Language Modeling) project.
Model Details
| Property | Value |
|---|---|
| Architecture | LLaMA-style (pre-norm, SwiGLU, RoPE) |
| Parameters | ~1.4B |
| Training tokens | 220B |
| Knowledge cutoff | January 2019 |
| Vocab size | 50,432 |
| Context length | 2,048 |
| Original format | Apple Open LM |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"dogtooth/open-lm-1b-201901",
dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
Conversion Notes
- Converted from the original Open LM
.ptcheckpoint to a customOpenLMForCausalLMformat. - Uses LayerNorm (not RMSNorm) to match the original Open LM training.
- Includes QK norm (LayerNorm on Q and K projections before attention).
- Architecture dimensions are auto-detected from checkpoint weights.
- Requires
trust_remote_code=Truewhen loading.
Citation
@article{jain2024ticlm,
title={Time-Continual Learning from a Streaming Language Model},
author={Jain, Ameya and Ramesh, Aakanksha and Li, Tianjian and others},
journal={arXiv preprint arXiv:2410.14660},
year={2024}
}