SWE-bench

community

https://swe-bench.com

SWE-bench

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

john-b-yang authored a paper about 18 hours ago

SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?

john-b-yang authored a paper about 18 hours ago

OpenThoughts: Data Recipes for Reasoning Models

john-b-yang authored a paper about 18 hours ago

LongCodeBench: Evaluating Coding LLMs at 1M Context Windows

View all activity

john-b-yang

authored 9 papers about 18 hours ago

EnIGMA: Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities

Paper • 2409.16165 • Published Sep 24, 2024

Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration

Paper • 2412.15701 • Published Dec 20, 2024

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

Paper • 2601.11868 • Published Jan 17 • 36

SWE-Protégé: Learning to Selectively Collaborate With an Expert Unlocks Small Language Models as Software Engineering Agents

Paper • 2602.22124 • Published Feb 25 • 2

SWE-chat: Coding Agent Interactions From Real Users in the Wild

Paper • 2604.20779 • Published 23 days ago • 14

ProgramBench: Can Language Models Rebuild Programs From Scratch?

Paper • 2605.03546 • Published 10 days ago • 3

arpandeepk

published a dataset about 1 month ago

SWE-bench/SWE-prime

Viewer • Updated Apr 13 • 1.36k • 48

arpandeepk

updated a dataset about 1 month ago

SWE-bench/SWE-prime

Viewer • Updated Apr 13 • 1.36k • 48

john-b-yang

updated a collection 2 months ago

SWE-smith

Collection

SWE-smith datasets of task instances for different programming languages • 9 items • Updated Mar 9 • 3

john-b-yang

updated a dataset 2 months ago

SWE-bench/SWE-smith-cpp

Viewer • Updated Mar 9 • 5.12k • 92

john-b-yang

published a dataset 2 months ago

SWE-bench/SWE-smith-cpp

Viewer • Updated Mar 9 • 5.12k • 92

john-b-yang

updated a dataset 3 months ago

SWE-bench/SWE-smith-ts

Viewer • Updated Feb 28 • 5.03k • 440

john-b-yang

published a dataset 3 months ago

SWE-bench/SWE-smith-ts

Viewer • Updated Feb 28 • 5.03k • 440

ofirpress

in SWE-bench/SWE-bench_Verified 3 months ago

Update eval.yaml

#4 opened 3 months ago by

SaylorTwift

Add eval.yaml

#2 opened 3 months ago by

nielsr

john-b-yang

updated a dataset 3 months ago

SWE-bench/SWE-smith-java

Viewer • Updated Feb 12 • 7.47k • 1.02k

AI & ML interests

Recent Activity

Team members 7

SWE-bench's activity

Update eval.yaml

Add eval.yaml