Papers
arxiv:2512.08440

What Triggers my Model? Contrastive Explanations Inform Gender Choices by Translation Models

Published on Dec 9, 2025
Authors:
,
,

Abstract

Research explores gender bias origins in neural machine translation by analyzing source sentence contexts that influence target language gender inflections through saliency attribution and contrastive translations.

AI-generated summary

Interpretability can be implemented to understand decisions taken by (black box) models, such as neural machine translation (NMT) or large language models (LLMs). Yet, research in this area has been limited in relation to a manifested problem in these models: gender bias. In this work, we aim to move away from simply measuring bias to exploring its origins. Working with gender-ambiguous natural source data, this exploratory study examines which context, in the form of input tokens in the source sentence (EN), influences (or triggers) the NMT model's choice of a certain gender inflection in the target languages (DE/ES). To analyse this, we compute saliency attribution based on contrastive translations. We first address the challenge of the lack of a scoring threshold and specifically examine different attribution levels of source words on the model's gender decisions in the translation. We compare salient source words with human perceptions of gender and demonstrate a noticeable overlap between human perceptions and model attribution. Additionally, we provide a linguistic analysis of salient words. Our work showcases the relevance of understanding model translation decisions in terms of gender, how this compares to human decisions and that this information should be leveraged to mitigate gender bias.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2512.08440
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2512.08440 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2512.08440 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2512.08440 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.