Llama-qwen3-4b-RPG
Llama-qwen3-4b-RPG is a fine-tuned variant of Qwen3-4B with facebook/research-plan-gen datasets Llama4-maverick generated, optimized for Research Plan Generation (RPG).
The model is designed to generate structured, high-quality research plans for complex scientific and technical tasks across multiple domains.
It is trained by using Unsloth notebook
Key Features
Research-aware generation
Produces clear, structured research plans with goals, methodologies, evaluation criteria, and constraints.Two-stage training
- SFT warm-up for instruction following
- GRPO refinement using custom reward functions
Multi-domain coverage
- Machine Learning
- ArXiv research
- PubMed / biomedical research
Custom chat template Tailored specifically for research-planning tasks rather than generic chat.
Long-form optimized Tuned for long context windows and coherent multi-section outputs.
Training Overview
Base Model
- Qwen3-4B
Dataset
- Research Plan Generation Dataset
- Source:
facebook/research-plan-gen - Structure:
GoalRubricReference Solution
- Source:
Training Strategy
Stage 1: Supervised Fine-Tuning (SFT)
- Learns structured research-plan formatting
- Aligns outputs with rubric-based expectations
Stage 2: GRPO Reinforcement Learning
- Improves plan quality using reward functions
- Encourages:
- Completeness
- Logical structure
- Methodological rigor
- Faithfulness to constraints
Intended Use Cases
- Automated research planning
- Scientific assistant systems
- Academic proposal drafting
- R&D ideation and experiment design
- LLM-based research agents
Limitations
- Not intended for factual verification or citation generation
- Outputs should be reviewed by domain experts
- Optimized for planning, not final paper writing
License
This model follows the license of its base model Qwen3-4B and the dataset used for training.
Please review upstream licenses before commercial use.
Acknowledgements
- Qwen model family
- Facebook Research Plan Generation dataset
- Open-source RLHF / GRPO tooling
Citation
If you use this model in research or products, please cite appropriately.
- Downloads last month
- 22
4-bit