SR-Scientist: Scientific Equation Discovery With Agentic AI

Community Article Published February 2, 2026

Author: Yoshitomo Matsubara
(Paper Authors: Shijie Xia, Yuhan Sun, Pengfei Liu)

I recently found an interesting open-sourced research project, "SR-Scientist: Scientific Equation Discovery With Agentic AI," accepted to ICLR 2026. The authors also published their proposed model and dataset at Hugging Face🤗. This article briefly summarizes and discusses the work and artifacts.

🤗 Paper: "SR-Scientist: Scientific Equation Discovery With Agentic AI"
🐙 Code: GAIR-NLP/SR-Scientist
🤗 Model: GAIR/SR-Scientist-30B
🤗 Dataset: GAIR/SR-Scientist

Background

Symbolic regression (SR) is the task of producing mathematical expression (symbolic expression) that has a given dataset, which is more interpretable for humans than deep learning models in general. Because of its interpretability, the topic has been increasingly popular and studied in AI/ML for Science communities such as Physics, Applied Mechanics, Climatology, Materials, and Chemistry.

Example: A symbolic regression model identifies a mathematical expression to explain the relation between input (left) and output (right) in the tabular data.
You can also play with PySR HF Space and learn how the SR task looks.

The field of symbolic regression has been further revolutionized by the integration of Large Language Models (LLMs) into evolutionary search methodologies. However, LLMs currently function as static modules within human-designed workflows, lacking the autonomy required to independently formulate and iterate on hypotheses through direct environmental engagement. By anchoring scientific discovery in agentic models, LLMs can evolve from static tools into proactive agents capable of driving the end-to-end research process.

SR-Scientist

To enable LLMs to identify scientific equations, the authors proposed, SR-Scientist, a novel framework driven by iterative feedback and long-horizon optimization (see Figure). Interestingly, they achieved this by leveraging a code interpreter to handle the heavy lifting of data analysis and performance assessment, instead of generating sympy-compatible equation strings or equation graphs like prior SR studies.

Source: GAIR-NLP/SR-Scientist

The framework uses two separate tools:

Equation Evaluatior: Performs BFGS-driven parameter optimization on Python-based mathematical functions and returns key goodness-of-fit metrics
Data Analyzer: Conducts Python-driven data analysis for anomalies and patterns without visualization capabilities

The model (🤗GAIR/SR-Scientist-30B) was trained on a synthetic dataset (see below) with GRPO (Group Relative Policy Optimization) to optimze the equation by leveraging the two tools over a long-horizon with minimal human-defined pipelines (i.e., the agent is free to determine its own workflow for a given problem). Different from math or code tasks that typically uses binary rewards based on the execution outcome, its reward design takes advantage of SR's unique properties and uses the performane of predicted equations (e.g., regression error) as true equations and true output values given input values are available in the training dataset.

The authors used a mixed strategy of role-based and model-based data synthesis. Using an LLM, they synthesize potential relation ships between variables for each of the scientific scenarios considered in the study and construct equation datasets by determining the values of the constants for each of the skeleton equations. The training dataset is available at 🤗GAIR/SR-Scientist.

Their experiments with LSR-Synth, the group of 129 synthetic problems in the LLM-SRBench dataset (🤗nnheui/llm-srbench), demonstrate that the SR-Scientist model without the RL training outperformed baselines in terms of $Acc_{0.01}$ (rate of solutions whose normalised mean squared error is equal or smaller than 0.01), $Acc_{0.001}$ , and symbolic accuracy (rate of solutions identical to the true equations)). The results also show that SR-Scientist with the RL training further improved the results of the SR-Scientist.

Set-up and Play

Try SR-Scientist using the LLM-SRBench dataset

Step 1: Set up and run Sandbox

git clone https://github.com/bytedance/SandboxFusion.git
cd SandboxFusion
conda create -n sandbox-runtime python=3.11 -y
conda activate sandbox-runtime
poetry install

mkdir -p docs/build
cd runtime/python
pip install -r requirements.txt

cd ../..
make run-online

Step 2: Set up SR-Scientist (open a new terminal/tab)

git clone https://github.com/GAIR-NLP/SR-Scientist
cd SR-Scientist
conda create -n srscientist python=3.11 -y
conda activate srscientist
pip install torch==2.8.0
pip install "sglang[all]==0.5.2"
pip install h5py

Step 3: Preprocess LLM-SRBench dataset

hf auth login
# Visit and agree to terms at https://huggingface.co/datasets/nnheui/llm-srbench
hf download nnheui/llm-srbench --repo-type dataset --local-dir ./data/inference

Step 4: Run inference

bash inference/scripts/inference_whole_sr_scientist_30B.sh

Their code repository provides more details and you can learn more about other usages such as training.

Potential Extension/Application

A unique property of SR-Scientist is its autonomous interactions with data through tools over multiple turns to collect sufficient information to design equations that fit data. While their experimental results suggest that it is still challenging for existing LLMs to serve SR tasks, understanding tabular data as text data, I find their approach promising for future studies on ML for Science.

For example, a potential SR application based on such agentic SR methods can be built by replacing data analysis tool with human domain experts to control the degree of explorations to some extent. Putting human domain experts would be essential when discussing results of scientific discovery from data, and allowing the human domain experts to provide feedback in natural language sounds practical and reasonable.

Conclusion

SR-Scientist is a novel agentic framework that autonomously serves symbolic regression tasks and discovers scientific equations through long-horizon, tool-driven data analysis and equation evaluation. The code, model, and training dataset are publicly available.