open-lm-1b-201901 / README.md
dogtooth's picture
Upload folder using huggingface_hub
2183a68 verified
metadata
license: apple-ascl
tags:
  - open-lm
  - temporal
  - tic-lm
  - causal-lm
library_name: transformers
pipeline_tag: text-generation

Open LM 1B — Knowledge Cutoff January 2019

This is a HuggingFace-format conversion of the Apple Open LM 1B oracle model trained with a knowledge cutoff of January 2019, from the TiC-LM (Time-Continual Language Modeling) project.

Model Details

Property Value
Architecture LLaMA-style (pre-norm, SwiGLU, RoPE)
Parameters ~1.4B
Training tokens 220B
Knowledge cutoff January 2019
Vocab size 50,432
Context length 2,048
Original format Apple Open LM

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "dogtooth/open-lm-1b-201901",
    dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")

Conversion Notes

  • Converted from the original Open LM .pt checkpoint to a custom OpenLMForCausalLM format.
  • Uses LayerNorm (not RMSNorm) to match the original Open LM training.
  • Includes QK norm (LayerNorm on Q and K projections before attention).
  • Architecture dimensions are auto-detected from checkpoint weights.
  • Requires trust_remote_code=True when loading.

Citation

@article{jain2024ticlm,
  title={Time-Continual Learning from a Streaming Language Model},
  author={Jain, Ameya and Ramesh, Aakanksha and Li, Tianjian and others},
  journal={arXiv preprint arXiv:2410.14660},
  year={2024}
}