hmm…
You do not need optuna.get_current_trial(). Optuna does not expose a “current trial” global in normal usage. The Trial is passed into an objective function, and libraries (like Transformers) have to thread it through for you.
In Hugging Face Trainer hyperparameter search, the Optuna trial is available. It is stored on the trainer as trainer._trial during HP search setup, and evaluation results are funneled through _report_to_hp_search(...) where Optuna reporting and pruning happen. (GitHub)
So the clean pattern is:
- Access the trial via
trainer._trial (no globals).
- Track “best-so-far” metrics inside the trial run (because you get many evals per trial).
- Write only the best (or final) metrics into
trial.user_attrs so the dashboard shows what you care about.
Background: why you “only see the last accuracy”
Inside one Optuna trial, Trainer calls evaluation multiple times (per epoch or per eval_steps). Each time, you get a metrics dict. If you just do:
trial.set_user_attr("accuracy", metrics["eval_accuracy"])
then you overwrite the attribute every eval, and you end up with the last accuracy, not “accuracy at best F1”.
Also note: Optuna user attrs must be JSON-serializable, so cast NumPy scalars to float.
The simplest robust solution: a callback that uses trainer._trial (no globals, safe in DDP)
You already found callbacks. The missing pieces are:
- don’t store every step unless you really want to (SQLite gets slow and bloated)
- only update attrs when the objective improves
- only write from rank 0 (world process zero)
# deps:
# transformers>=4.57
# optuna>=4.x
from transformers import TrainerCallback
class OptunaUserAttrsCallback(TrainerCallback):
def __init__(
self,
trainer,
objective_key="eval_f1", # or "eval_macro_f1", etc
extra_keys=("eval_accuracy", "eval_loss"),
write_all_best_metrics=False,
):
self.trainer = trainer
self.objective_key = objective_key
self.extra_keys = tuple(extra_keys)
self.write_all_best_metrics = write_all_best_metrics
self.best_objective = None
self.best_metrics = None
self.best_step = None
def _get_trial(self):
return getattr(self.trainer, "_trial", None)
def on_evaluate(self, args, state, control, metrics=None, **kwargs):
if metrics is None:
return
if not self.trainer.is_world_process_zero():
return
obj = metrics.get(self.objective_key, None)
if obj is None:
return
obj = float(obj)
if self.best_objective is None or obj > self.best_objective:
self.best_objective = obj
self.best_step = int(state.global_step)
# store a copy of best metrics
best = {}
for k, v in metrics.items():
if isinstance(v, (int, float)):
best[k] = float(v)
self.best_metrics = best
trial = self._get_trial()
if trial is None or not hasattr(trial, "set_user_attr"):
return
# minimal, high-signal attrs
trial.set_user_attr("best_step", self.best_step)
trial.set_user_attr(f"best_{self.objective_key}", self.best_objective)
for k in self.extra_keys:
if k in metrics:
trial.set_user_attr(f"best_{k}", float(metrics[k]))
# optional: dump every best metric key
if self.write_all_best_metrics:
for k, v in self.best_metrics.items():
trial.set_user_attr(f"best_{k}", v)
def on_train_end(self, args, state, control, **kwargs):
# Important: make Optuna's returned objective match your best-so-far,
# not whatever happened at the last eval.
if not self.trainer.is_world_process_zero():
return
if self.best_objective is not None:
self.trainer.objective = self.best_objective
Usage:
# Make sure you evaluate during training, otherwise there is nothing to track.
# args.eval_strategy = "epoch" (or steps) and you have eval_dataset set.
trainer.add_callback(
OptunaUserAttrsCallback(
trainer,
objective_key="eval_f1",
extra_keys=("eval_accuracy", "eval_precision", "eval_recall"),
)
)
best_run = trainer.hyperparameter_search(
direction="maximize",
backend="optuna",
hp_space=optuna_hp_space,
n_trials=args.n_trials,
compute_objective=lambda m: m["eval_f1"], # use the same key you track
study_name=args.study_name,
storage="sqlite:///optuna_trials.db",
load_if_exists=True,
)
Why this works with Transformers:
Trainer stores the live trial on self._trial in _hp_search_setup. (GitHub)
- Each eval calls
_report_to_hp_search(...), which computes objective and reports to Optuna. You piggyback by reading trainer._trial during those eval callbacks. (GitHub)
Why this is safer than your global:
- no cross-trial overwrites
- no thread-shared global state
- in DDP, you only write from rank 0
How to see these in the dashboard
If you are using Optuna Dashboard, it can show trial user attrs. The docs show running it against your SQLite storage.
Common pitfalls in your snippet
-
Metric key names: Trainer.evaluate() typically prefixes metrics with eval_. So you usually want eval_f1, eval_accuracy, not f1, accuracy. (Print metrics.keys() once to confirm.)
-
Write volume: writing epoch_{step}_{key} for many keys every eval can make SQLite sluggish. Prefer:
best_eval_accuracy, best_eval_loss, best_step
-
Objective mismatch: by default, the value returned to Optuna can end up being “last eval objective”. Setting trainer.objective = best_objective at on_train_end fixes that for single-objective studies.
Similar cases and discussion online (context)
These aren’t exactly “user_attrs”, but they are the same integration pain point: Trainer.hyperparameter_search hides the Optuna plumbing.
- HF forum threads on
hyperparameter_search with Optuna and what it does or does not expose (good for understanding expectations and limitations). (Hugging Face Forums)
- Transformers GitHub issues around
hyperparameter_search behavior and backend quirks (useful when you hit edge cases like GPUs sitting idle or backend kwargs). (GitHub)
If I were in your exact case, what I would do
-
Confirm metric keys once
Run one normal trainer.evaluate() and print the dict. Decide the canonical keys (eval_f1, eval_accuracy).
-
Decide what “accuracy” you want
Usually you want accuracy at best F1 within each trial. That is what the callback above records.
-
Keep user attrs small and meaningful
I would write:
best_step
best_eval_f1
best_eval_accuracy
- maybe
best_eval_loss
-
Make Optuna’s objective match best F1
Set trainer.objective at on_train_end like above. Otherwise Optuna may rank trials by “last eval” instead of “best eval”.
-
Avoid in-process parallel trials with this integration
I would run trials sequentially per process (Optuna default) and parallelize by launching multiple worker processes if needed, all pointing at the same RDB storage. (This avoids sharing one Trainer across threads.)
-
If you truly need full per-epoch curves per trial
I would not spam user_attrs. I would log to a file per trial (JSONL) and attach it elsewhere, or use a richer tracking tool. Optuna’s “artifact store” direction is relevant here.
High-quality references and starting points
Summary
optuna.get_current_trial() is not the right model. Optuna expects the trial to be passed.
- Transformers already stores the live trial at
trainer._trial. Use that. (GitHub)
- Track best F1 inside each trial, and only then write
best_eval_accuracy etc into trial.user_attrs.
- Write attrs only on world process zero. Keep the number of attrs small.