RLHF Trojan Competition Collection Datasets and models used for the trojan detection competition co-located at SaTML 2024: https://github.com/ethz-spylab/rlhf_trojan_competition • 20 items • Updated Apr 30, 2024 • 4
view article Article Introducing the Red-Teaming Resistance Leaderboard +2 steve-sli, richard2, leonardtang, clefourrier • Feb 23, 2024 • 13
Quirky Models and Datasets Collection A collection of datasets and finetuned models that can be used for Eliciting Latent Knowledge (ELK) research. • 180 items • Updated Feb 26, 2025 • 2