arxiv:2606.02373

Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

Published on Jun 1

· Submitted by

Patrick Jiang on Jun 2

chroma

Upvote

Authors:

Abstract

A 20B search agent trained with reinforcement learning within a stateful search framework demonstrates superior retrieval performance across multiple domains by separating semantic decision-making from environmental bookkeeping.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Search agents are often trained as policies over growing transcripts: the model must decide how to search while also remembering what it has seen, which evidence is useful, which constraints remain open, and which claims have actually been checked. We argue that this formulation puts too much routine state management inside the policy: reinforcement learning is forced to optimize both semantic search decisions and recoverable bookkeeping that the environment can maintain more reliably. We introduce Harness-1, a 20B search agent (retrieval subagent) trained with reinforcement learning inside a stateful search harness. The harness maintains environment-side working memory, including a candidate pool, an importance-tagged curated set, compact evidence links, verification records, compressed and deduplicated observations, and budget-aware context rendering. The policy retains the semantic decisions: what to search, which documents to keep or discard, what to verify, and when to stop. Across eight retrieval benchmarks spanning web, finance, patents, and multi-hop QA, Harness-1 achieves 0.730 average curated recall, outperforming the next strongest open search subagent by +11.4 points and remaining competitive with much larger frontier-model searchers. Its gains are especially strong on held-out transfer benchmarks, suggesting that reinforcement learning over explicit search state can produce retrieval behaviors that generalize beyond the training domains. Our code is available at https://github.com/pat-jj/harness-1.

View arXiv page View PDF GitHub 13 Add to collection

Community

pat-jj

Paper submitter about 24 hours ago

•

edited about 24 hours ago

🔥 Introducing Harness-1 🔥

Harness-1 is a 20B open search agent trained with state-externalizing harnesses, matching or outperforming several much larger frontier-model searchers on difficult retrieval tasks.

Harness-1 performance

Motivation

Many search agents are trained over growing transcripts. As a result, the model has to search while also doing a lot of implicit bookkeeping:

remembering candidate documents,
tracking useful evidence,
maintaining verification status,
recalling search history,
and avoiding repeatedly revisiting what has already been seen.

This makes the model responsible not only for search decisions, but also for managing the entire search state inside its context.

Key idea

Harness-1 separates these responsibilities.

The policy still makes the semantic decisions:

what to search,
what to inspect,
what to curate,
what to verify,
and when to stop.

But the harness maintains the recoverable search state around those decisions, including candidate pools, curated evidence, evidence links, verification records, and budget-aware context rendering.

With this setup, RL does not need to teach the model to manage an unstructured transcript from scratch. Instead, it trains the model to operate over a structured search workspace.

Results

Across 8 difficult retrieval benchmarks, Harness-1 reaches 0.730 average curated recall, outperforming the next strongest open search subagent by +11.4 points, while remaining competitive with much larger frontier-model searchers.

The most interesting result to us is transfer: the gains are substantially larger on held-out transfer benchmarks than on source-family benchmarks. Ablations also show that removing the harness mechanisms changes agent behavior and hurts recall.