ICLR 2026 LogicReward LoRA Adapter (LLaMA 3.1 8B)

1. Introduction

This repository provides LoRA adapter weights only for LLaMA 3.1 8B, trained using LLaMA-Factory as part of the LogicReward project.

⚠️ Important: This repository does NOT contain the base model weights.
You must separately obtain the base model from Hugging Face.

This model corresponds to one trained variant in LogicReward series.
See the collection page for other variants (e.g., LogicReward-Qwen3-8B).


2. Model Information

  • Base model: meta-llama/Meta-Llama-3.1-8B-Instruct
  • Model type: LoRA adapter (PEFT)
  • Training framework: LLaMA-Factory
  • Training stages: SFT → DPO
  • Architecture: Decoder-only Transformer
  • Language: English
  • License: Inherits base model license (LLaMA 3)

Detailed training configuration and datasets are described in the paper.


3. How to Use

Installation

pip install -U transformers peft accelerate

Load Base Model + LoRA Adapter

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
adapter_id = "Aiden0526/LogicReward-Llama3.1-8B"

tokenizer = AutoTokenizer.from_pretrained(base_model_id, use_fast=True)

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()

Inference Example

prompt = "Explain symbolic reasoning in simple terms."

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.7,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Chat / Instruction Format

This adapter expects the LLaMA 3.1 Instruct chat format.

If you use a custom chat template, follow the chat_template.jinja included in this repository.


Citation

If you use this model, please cite:

@article{logicreward2026,
  title   = {LogicReward: Incentivizing LLM Reasoning via Step-Wise Logical Supervision},
  author  = {Jundong Xu, Hao Fei, Huichi Zhou, Xin Quan, Qijun Huang, Shengqiong Wu, William Yang Wang, Mong-Li Lee, Wynne Hsu},
  booktitle = {Proceedings of the International Conference on Learning Representations},
  year    = {2026},
  url = {https://arxiv.org/abs/2512.18196}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Aiden0526/LogicReward-Llama3.1-8B

Adapter
(1513)
this model

Collection including Aiden0526/LogicReward-Llama3.1-8B

Paper for Aiden0526/LogicReward-Llama3.1-8B