Human-annotated intent dataset and intent-aware safety classifiers (SFT, DPO, distillation, GRPO) for robust LLM guardrails.
Jeremias Ferrao
Jazhyc
·
AI & ML interests
None yet
Recent Activity
upvoted a collection 2 days ago
AIMS: Intent-Aware Safety Classification updated a collection 3 days ago
AIMS: Intent-Aware Safety Classification updated a collection 3 days ago
AIMS: Intent-Aware Safety ClassificationOrganizations
None yet