Release of Pre-Trained Models for the Japanese Language
Paper
β’
2404.01657
β’
Published
β’
1
LoRA adapter for instruction-tuned rinna/japanese-gpt-neox-small in Japanese.
rinna/japanese-gpt-neox-smallLoad the base model and apply the LoRA adapter:
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
base_id = "rinna/japanese-gpt-neox-small"
adapter_id = "takehika/rinna-neox-small-ja-it-adapter"
tokenizer = AutoTokenizer.from_pretrained(base_id, use_fast=False)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
base_model = AutoModelForCausalLM.from_pretrained(base_id)
model = PeftModel.from_pretrained(base_model, adapter_id).eval()
def build_prompt(instruction, input_text=""):
if input_text:
return (
"### Instruction:\n"
f"{instruction}\n\n"
"### Input:\n"
f"{input_text}\n\n"
"### Response:\n"
)
else:
return (
"### Instruction:\n"
f"{instruction}\n\n"
"### Response:\n"
)
instruction = "ζ―ζ₯ε₯εΊ·ηγ«ιγγγ³γγ5γ€ζγγ¦γγ γγγ"
prompt = build_prompt(instruction)
inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False)
with torch.no_grad():
output_ids = model.generate(
**inputs,
max_new_tokens=128,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
eos_token_id=tokenizer.eos_token_id,
)
gen_ids = output_ids[0][inputs["input_ids"].shape[1]:]
generated = tokenizer.decode(gen_ids, skip_special_tokens=True)
print(generated)
llm-jp/llm-jp-instructionskunishou/databricks-dolly-15k-jaThis adapter is instruction-tuned with a prompt-response format:
### Instruction:
{instruction}
### Input:
{input}
### Response:
{response}
LoRA adapters are trained on the base model, and the adapter is applied at inference time by loading the base model and the adapter weights.
rinna/japanese-gpt-neox-small β MIT License
llm-jp/llm-jp-instructions β CC BY 4.0
kunishou/databricks-dolly-15k-ja β CC BY-SA 3.0
This adapter modifies the base model by fine-tuning on the above datasets.
@misc{rinna-japanese-gpt-neox-small,
title = {rinna/japanese-gpt-neox-small},
author = {Zhao, Tianyu and Sawada, Kei},
url = {https://huggingface.co/rinna/japanese-gpt-neox-small}
}
@inproceedings{sawada2024release,
title = {Release of Pre-Trained Models for the {J}apanese Language},
author = {Sawada, Kei and Zhao, Tianyu and Shing, Makoto and Mitsui, Kentaro and Kaga, Akio and Hono, Yukiya and Wakatsuki, Toshiaki and Mitsuda, Koh},
booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
month = {5},
year = {2024},
pages = {13898--13905},
url = {https://aclanthology.org/2024.lrec-main.1213},
note = {\\url{https://arxiv.org/abs/2404.01657}}
}
Base model
rinna/japanese-gpt-neox-small