ICLR 2026 LogicReward LoRA Adapter (LLaMA 3.1 8B)
1. Introduction
This repository provides LoRA adapter weights only for LLaMA 3.1 8B, trained using LLaMA-Factory as part of the LogicReward project.
📄 Paper:
LogicReward: Incentivizing LLM Reasoning via Step-Wise Logical Supervision (ICLR 2026)
📦 Model Collection (all variants):
https://huggingface.co/collections/Aiden0526/logicreward
⚠️ Important: This repository does NOT contain the base model weights.
You must separately obtain the base model from Hugging Face.
This model corresponds to one trained variant in LogicReward series.
See the collection page for other variants (e.g., LogicReward-Qwen3-8B).
2. Model Information
- Base model:
meta-llama/Meta-Llama-3.1-8B-Instruct - Model type: LoRA adapter (PEFT)
- Training framework: LLaMA-Factory
- Training stages: SFT → DPO
- Architecture: Decoder-only Transformer
- Language: English
- License: Inherits base model license (LLaMA 3)
Detailed training configuration and datasets are described in the paper.
3. How to Use
Installation
pip install -U transformers peft accelerate
Load Base Model + LoRA Adapter
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
adapter_id = "Aiden0526/LogicReward-Llama3.1-8B"
tokenizer = AutoTokenizer.from_pretrained(base_model_id, use_fast=True)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()
Inference Example
prompt = "Explain symbolic reasoning in simple terms."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.7,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Chat / Instruction Format
This adapter expects the LLaMA 3.1 Instruct chat format.
If you use a custom chat template, follow the chat_template.jinja
included in this repository.
Citation
If you use this model, please cite:
@article{logicreward2026,
title = {LogicReward: Incentivizing LLM Reasoning via Step-Wise Logical Supervision},
author = {Jundong Xu, Hao Fei, Huichi Zhou, Xin Quan, Qijun Huang, Shengqiong Wu, William Yang Wang, Mong-Li Lee, Wynne Hsu},
booktitle = {Proceedings of the International Conference on Learning Representations},
year = {2026},
url = {https://arxiv.org/abs/2512.18196}
}
Model tree for Aiden0526/LogicReward-Llama3.1-8B
Base model
meta-llama/Llama-3.1-8B