weiliu's picture

weiliu

thinkwee

·

https://thinkwee.top/about/

AI & ML interests

LLM reasoning, agents

Recent Activity

updated a Space about 7 hours ago

thinkwee/BibGuard

published a Space about 7 hours ago

thinkwee/BibGuard

upvoted a paper 2 days ago

Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs

View all activity

Organizations

None yet

New activity in thinkwee/NOVEReason_5k 6 months ago

[bot] Conversion to Parquet

#1 opened 6 months ago by

parquet-converter

commented 3 papers 8 months ago

NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning

Paper • 2505.16022 • Published May 21, 2025 • 4 •

NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning

Paper • 2505.16022 • Published May 21, 2025 • 4 •

NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning

Paper • 2505.16022 • Published May 21, 2025 • 4 •