How to add user attributes to trials in huggingface trainer

fancyerii · January 12, 2026, 2:58am

I am using optuna with huggingface trainer. Here is my code:

    def optuna_hp_space(trial):
        return {
            "learning_rate": trial.suggest_float("learning_rate", 0.5e-6, 0.5e-4, log=True),
            "weight_decay": trial.suggest_float("weight_decay", 0.0, 0.3),
        }

        def compute_objective(metrics):
            return metrics["f1"]

        best_run = trainer.hyperparameter_search(
            direction="maximize",
            backend="optuna",
            hp_space=optuna_hp_space,
            n_trials=args.n_trials,
            compute_objective=compute_objective,
            study_name=args.study_name,
            storage="sqlite:///optuna_trials.db",
            load_if_exists=True,
            pruner=MedianPruner(n_startup_trials=5, n_warmup_steps=2),
        )

Besides the objective “f1”, I also want to see other metrics such as accuracy in the dashboard. Acorrding to https://optuna.readthedocs.io/en/stable/tutorial/20_recipes/003_attributes.html#adding-user-attributes-to-trials, It’s very easy to add User Attributes to Trials by:

trial.set_user_attr("accuracy", accuracy)

But in trainer, I can’t get trail object. I asked this question to gemini, it says I can get trial by:

trial = optuna.get_current_trial()

But when I ran, it raise exception.
Now I am hacking this problem by using global variable to store trial object in optuna_hp_space function and use it in compute_objective. But if in multithreads environment, I don’t know whether it works.

_active_trial = None

    def optuna_hp_space(trial):
        global _active_trial
        _active_trial = trial


     def compute_objective(metrics):
         _active_trial.set_user_attr(key, value)

my environment:

optuna                    4.6.0
accelerate                1.12.0 
transformers              4.57.3

fancyerii · January 12, 2026, 8:41am

The code above is not correct, it just save the last accuracy, not the accuracy of best f1.
I can manage to use TrainerCallback to save every evaluate result and save them to trial. But I think it’s still a hacking.

        from transformers import TrainerCallback
        class OptunaBestMetricCallback(TrainerCallback):

            def on_evaluate(self, _args, state, control, metrics=None, **kwargs):
                current_step = state.global_step
                max_steps = state.max_steps
                length = len(str(max_steps))
                for key, value in metrics.items():
                    if (key.startswith("eval_val_recall") or
                        key == 'eval_val_accuracy' or
                        key == 'eval_train_accuracy' or
                        key == args.metric
                        ):
                        _active_trial.set_user_attr(f"epoch_{current_step:0{length}}_{key}", value)

John6666 · January 12, 2026, 2:47pm

hmm…

You do not need optuna.get_current_trial(). Optuna does not expose a “current trial” global in normal usage. The Trial is passed into an objective function, and libraries (like Transformers) have to thread it through for you.

In Hugging Face Trainer hyperparameter search, the Optuna trial is available. It is stored on the trainer as trainer._trial during HP search setup, and evaluation results are funneled through _report_to_hp_search(...) where Optuna reporting and pruning happen. (GitHub)

So the clean pattern is:

Access the trial via trainer._trial (no globals).
Track “best-so-far” metrics inside the trial run (because you get many evals per trial).
Write only the best (or final) metrics into trial.user_attrs so the dashboard shows what you care about.

Background: why you “only see the last accuracy”

Inside one Optuna trial, Trainer calls evaluation multiple times (per epoch or per eval_steps). Each time, you get a metrics dict. If you just do:

trial.set_user_attr("accuracy", metrics["eval_accuracy"])

then you overwrite the attribute every eval, and you end up with the last accuracy, not “accuracy at best F1”.

Also note: Optuna user attrs must be JSON-serializable, so cast NumPy scalars to float.

The simplest robust solution: a callback that uses `trainer._trial` (no globals, safe in DDP)

You already found callbacks. The missing pieces are:

don’t store every step unless you really want to (SQLite gets slow and bloated)
only update attrs when the objective improves
only write from rank 0 (world process zero)

# deps:
#   transformers>=4.57
#   optuna>=4.x

from transformers import TrainerCallback

class OptunaUserAttrsCallback(TrainerCallback):
    def __init__(
        self,
        trainer,
        objective_key="eval_f1",                 # or "eval_macro_f1", etc
        extra_keys=("eval_accuracy", "eval_loss"),
        write_all_best_metrics=False,
    ):
        self.trainer = trainer
        self.objective_key = objective_key
        self.extra_keys = tuple(extra_keys)
        self.write_all_best_metrics = write_all_best_metrics

        self.best_objective = None
        self.best_metrics = None
        self.best_step = None

    def _get_trial(self):
        return getattr(self.trainer, "_trial", None)

    def on_evaluate(self, args, state, control, metrics=None, **kwargs):
        if metrics is None:
            return
        if not self.trainer.is_world_process_zero():
            return

        obj = metrics.get(self.objective_key, None)
        if obj is None:
            return

        obj = float(obj)
        if self.best_objective is None or obj > self.best_objective:
            self.best_objective = obj
            self.best_step = int(state.global_step)

            # store a copy of best metrics
            best = {}
            for k, v in metrics.items():
                if isinstance(v, (int, float)):
                    best[k] = float(v)
            self.best_metrics = best

            trial = self._get_trial()
            if trial is None or not hasattr(trial, "set_user_attr"):
                return

            # minimal, high-signal attrs
            trial.set_user_attr("best_step", self.best_step)
            trial.set_user_attr(f"best_{self.objective_key}", self.best_objective)

            for k in self.extra_keys:
                if k in metrics:
                    trial.set_user_attr(f"best_{k}", float(metrics[k]))

            # optional: dump every best metric key
            if self.write_all_best_metrics:
                for k, v in self.best_metrics.items():
                    trial.set_user_attr(f"best_{k}", v)

    def on_train_end(self, args, state, control, **kwargs):
        # Important: make Optuna's returned objective match your best-so-far,
        # not whatever happened at the last eval.
        if not self.trainer.is_world_process_zero():
            return
        if self.best_objective is not None:
            self.trainer.objective = self.best_objective

Usage:

# Make sure you evaluate during training, otherwise there is nothing to track.
# args.eval_strategy = "epoch" (or steps) and you have eval_dataset set.

trainer.add_callback(
    OptunaUserAttrsCallback(
        trainer,
        objective_key="eval_f1",
        extra_keys=("eval_accuracy", "eval_precision", "eval_recall"),
    )
)

best_run = trainer.hyperparameter_search(
    direction="maximize",
    backend="optuna",
    hp_space=optuna_hp_space,
    n_trials=args.n_trials,
    compute_objective=lambda m: m["eval_f1"],   # use the same key you track
    study_name=args.study_name,
    storage="sqlite:///optuna_trials.db",
    load_if_exists=True,
)

Why this works with Transformers:

Trainer stores the live trial on self._trial in _hp_search_setup. (GitHub)
Each eval calls _report_to_hp_search(...), which computes objective and reports to Optuna. You piggyback by reading trainer._trial during those eval callbacks. (GitHub)

Why this is safer than your global:

no cross-trial overwrites
no thread-shared global state
in DDP, you only write from rank 0

How to see these in the dashboard

If you are using Optuna Dashboard, it can show trial user attrs. The docs show running it against your SQLite storage.

Common pitfalls in your snippet

Metric key names: Trainer.evaluate() typically prefixes metrics with eval_. So you usually want eval_f1, eval_accuracy, not f1, accuracy. (Print metrics.keys() once to confirm.)
Write volume: writing epoch_{step}_{key} for many keys every eval can make SQLite sluggish. Prefer:
- best_eval_accuracy, best_eval_loss, best_step
Objective mismatch: by default, the value returned to Optuna can end up being “last eval objective”. Setting trainer.objective = best_objective at on_train_end fixes that for single-objective studies.

Similar cases and discussion online (context)

These aren’t exactly “user_attrs”, but they are the same integration pain point: Trainer.hyperparameter_search hides the Optuna plumbing.

HF forum threads on hyperparameter_search with Optuna and what it does or does not expose (good for understanding expectations and limitations). (Hugging Face Forums)
Transformers GitHub issues around hyperparameter_search behavior and backend quirks (useful when you hit edge cases like GPUs sitting idle or backend kwargs). (GitHub)

If I were in your exact case, what I would do

Confirm metric keys once
Run one normal trainer.evaluate() and print the dict. Decide the canonical keys (eval_f1, eval_accuracy).
Decide what “accuracy” you want
Usually you want accuracy at best F1 within each trial. That is what the callback above records.
Keep user attrs small and meaningful
I would write:
- best_step
- best_eval_f1
- best_eval_accuracy
- maybe best_eval_loss
Make Optuna’s objective match best F1
Set trainer.objective at on_train_end like above. Otherwise Optuna may rank trials by “last eval” instead of “best eval”.
Avoid in-process parallel trials with this integration
I would run trials sequentially per process (Optuna default) and parallelize by launching multiple worker processes if needed, all pointing at the same RDB storage. (This avoids sharing one Trainer across threads.)
If you truly need full per-epoch curves per trial
I would not spam user_attrs. I would log to a file per trial (JSONL) and attach it elsewhere, or use a richer tracking tool. Optuna’s “artifact store” direction is relevant here.

High-quality references and starting points

Transformers HPO guide (Trainer hyperparameter search overview): https://huggingface.co/docs/transformers/en/hpo_train (Hugging Face)
Trainer docs (core API and customization points): https://huggingface.co/docs/transformers/en/main_classes/trainer (Hugging Face)
Optuna user attributes recipe: https://optuna.readthedocs.io/en/stable/tutorial/20_recipes/003_attributes.html
Optuna Dashboard docs: https://optuna-dashboard.readthedocs.io/en/latest/getting-started.html
Optuna examples repo (patterns for objective functions and storage): https://github.com/optuna/optuna-examples

Summary

optuna.get_current_trial() is not the right model. Optuna expects the trial to be passed.
Transformers already stores the live trial at trainer._trial. Use that. (GitHub)
Track best F1 inside each trial, and only then write best_eval_accuracy etc into trial.user_attrs.
Write attrs only on world process zero. Keep the number of attrs small.

fancyerii · January 13, 2026, 1:35am

Thanks. Got _trial from trainer is better. In your codes, you calculate the best metric by yourself and override the trainer.objective on on_train_end. I don’t know whether is fine. Maybe we should stick to the logical of trainer. Besides this problem, we need care about greater_is_better because for f1, greater is better. But for loss, smaller is better. So I modified your code:


        class OptunaUserAttrsCallback(TrainerCallback):
            def __init__(
                self,
                trainer,
                objective_key="eval_f1", 
                greater_is_better=True,
                extra_keys=("eval_accuracy", "eval_loss"),
                write_all_best_metrics=False,
            ):
                self.trainer = trainer
                self.objective_key = objective_key 
                self.greater_is_better = greater_is_better
                self.extra_keys = tuple(extra_keys)
                self.write_all_best_metrics = write_all_best_metrics

                self.best_objective = None
                self.best_metrics = None
                self.best_step = None

            def _current_is_better(self, obj):
                if self.best_objective is None:
                    return True
                if self.greater_is_better and obj > self.best_objective:
                    return True
                if not self.greater_is_better and obj < self.best_objective:
                    return True
                return False

            def _get_trial(self):
                return getattr(self.trainer, "_trial", None)

            def on_evaluate(self, args, state, control, metrics=None, **kwargs):
                if metrics is None:
                    return
                if not self.trainer.is_world_process_zero():
                    return

                obj = metrics.get(self.objective_key, None)
                if obj is None:
                    return

                obj = float(obj)
                if self._current_is_better(obj):
                    self.best_objective = obj
                    self.best_step = int(state.global_step)

                    # store a copy of best metrics
                    best = {}
                    for k, v in metrics.items():
                        if isinstance(v, (int, float)):
                            best[k] = float(v)
                    self.best_metrics = best

                    trial = self._get_trial()
                    if trial is None or not hasattr(trial, "set_user_attr"):
                        return

                    # minimal, high-signal attrs
                    trial.set_user_attr("best_step", self.best_step)
                    trial.set_user_attr(f"best_{self.objective_key}", self.best_objective)

                    for k in self.extra_keys:
                        if k in metrics:
                            trial.set_user_attr(f"best_{k}", float(metrics[k]))

                    # optional: dump every best metric key
                    if self.write_all_best_metrics:
                        for k, v in self.best_metrics.items():
                            trial.set_user_attr(f"best_{k}", v)

            def on_train_end(self, args, state, control, **kwargs):
                # Important: make Optuna's returned objective match your best-so-far,
                # not whatever happened at the last eval.
                if not self.trainer.is_world_process_zero():
                    return
                if self.best_objective is not None:
                    self.trainer.objective = self.best_objective

And I also implement a version that don’t calculate best metrics by myself.

        class OptunaUserAttrsCallback_v2(TrainerCallback):
            def __init__(
                self,
                trainer,
                objective_key="eval_f1", 
                extra_keys=("eval_accuracy", "eval_loss"),
            ):
                self.trainer = trainer
                self.objective_key = objective_key 
                self.extra_keys = tuple(extra_keys)


            def _get_trial(self):
                return getattr(self.trainer, "_trial", None)

            def on_evaluate(self, args, state, control, metrics=None, **kwargs):
                if metrics is None:
                    return
                if not self.trainer.is_world_process_zero():
                    return

                obj = metrics.get(self.objective_key, None)
                if obj is None:
                    return

                obj = float(obj)
                # save it to use for on_train_end
                self.obj = obj
                self.metrics = metrics


            def on_train_end(self, args, state, control, **kwargs):
                # Important: make Optuna's returned objective match your best-so-far,
                # not whatever happened at the last eval.
                if not self.trainer.is_world_process_zero():
                    return
                
                if self.trainer.objective is not None and self.trainer.objective == self.obj:

                    self.best_step = int(state.global_step)

                    # store a copy of best metrics
                    best = {}
                    for k, v in self.metrics.items():
                        if isinstance(v, (int, float)):
                            best[k] = float(v)
                    self.best_metrics = best

                    trial = self._get_trial()
                    if trial is None or not hasattr(trial, "set_user_attr"):
                        return

                    # minimal, high-signal attrs
                    trial.set_user_attr("best_step", self.best_step)
                    trial.set_user_attr(f"best_{self.objective_key}", self.obj)

                    for k in self.extra_keys:
                        if k in self.metrics:
                            trial.set_user_attr(f"best_{k}", float(self.metrics[k]))

Topic		Replies	Views
Transformers and Hyperparameter search using Optuna 🤗Transformers	4	6204	May 5, 2023
[Ray] How to get the best model per trial 🤗Transformers	1	553	November 18, 2021
Parallel HPO when using `trainer.hyperparameter_search()` 🤗Transformers	0	361	December 30, 2021
Hyperparameter_search does not log params after first trial 🤗Transformers	0	336	March 4, 2021
Loss values change but accuracy, f1 and recall remain the same 🤗Transformers	0	656	June 9, 2023