InfoSynth: Information-Guided Benchmark Synthesis for LLMs Paper • 2601.00575 • Published 23 days ago • 3
MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning Paper • 2512.16909 • Published Dec 18, 2025 • 2
Reliable Fine-Grained Evaluation of Natural Language Math Proofs Paper • 2510.13888 • Published Oct 14, 2025 • 2
Efficient and Scalable Estimation of Tool Representations in Vector Space Paper • 2409.02141 • Published Sep 2, 2024
Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions Paper • 2502.16761 • Published Feb 24, 2025
Virtual Personas for Language Models via an Anthology of Backstories Paper • 2407.06576 • Published Jul 9, 2024 • 1
Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks Paper • 2503.09572 • Published Mar 12, 2025 • 2
Higher-Order Binding of Language Model Virtual Personas: a Study on Approximating Political Partisan Misperceptions Paper • 2504.11673 • Published Apr 16, 2025 • 1
Can Large Vision Language Models Read Maps Like a Human? Paper • 2503.14607 • Published Mar 18, 2025 • 10
Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets Paper • 2505.15517 • Published May 21, 2025 • 4
Learning Adaptive Parallel Reasoning with Language Models Paper • 2504.15466 • Published Apr 21, 2025 • 44
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models Paper • 2311.18232 • Published Nov 30, 2023 • 1
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Paper • 2408.03314 • Published Aug 6, 2024 • 63