ICLR 2026 LogicReward LoRA Adapter (LLaMA 3.1 8B)

1. Introduction

This repository provides LoRA adapter weights only for LLaMA 3.1 8B, trained using LLaMA-Factory as part of the LogicReward project.

📄 Paper:

LogicReward: Incentivizing LLM Reasoning via Step-Wise Logical Supervision (ICLR 2026)

https://llm-symbol.github.io/LogicReward/
💻 Code:
https://github.com/Aiden0526/Logic-Reward
📦 Model Collection (all variants):
https://huggingface.co/collections/Aiden0526/logicreward

⚠️ Important: This repository does NOT contain the base model weights.
You must separately obtain the base model from Hugging Face.

This model corresponds to one trained variant in LogicReward series.
See the collection page for other variants (e.g., LogicReward-Qwen3-8B).

2. Model Information

Base model: meta-llama/Meta-Llama-3.1-8B-Instruct
Model type: LoRA adapter (PEFT)
Training framework: LLaMA-Factory
Training stages: SFT → DPO
Architecture: Decoder-only Transformer
Language: English
License: Inherits base model license (LLaMA 3)

Detailed training configuration and datasets are described in the paper.

3. How to Use

Installation

pip install -U transformers peft accelerate

Load Base Model + LoRA Adapter

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
adapter_id = "Aiden0526/LogicReward-Llama3.1-8B"

tokenizer = AutoTokenizer.from_pretrained(base_model_id, use_fast=True)

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()

Inference Example

prompt = "Explain symbolic reasoning in simple terms."

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.7,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Chat / Instruction Format

This adapter expects the LLaMA 3.1 Instruct chat format.

If you use a custom chat template, follow the chat_template.jinja included in this repository.

Citation

If you use this model, please cite:

@article{logicreward2026,
  title   = {LogicReward: Incentivizing LLM Reasoning via Step-Wise Logical Supervision},
  author  = {Jundong Xu, Hao Fei, Huichi Zhou, Xin Quan, Qijun Huang, Shengqiong Wu, William Yang Wang, Mong-Li Lee, Wynne Hsu},
  booktitle = {Proceedings of the International Conference on Learning Representations},
  year    = {2026},
  url = {https://arxiv.org/abs/2512.18196}
}