PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models Paper • 2604.08340 • Published 4 days ago • 5