Bicocca Open Archive

Zero-shot translations is a fascinating feature of Multilingual Neural Machine Translation (MNMT) systems. These MNMT models are usually trained on English-centric data, i.e. English either as the source or target language, and with a language label prepended to the input indicating the target language. However, recent work has highlighted several flaws of these models in zero-shot scenarios where language labels are ignored and the wrong language is generated or different runs show highly unstable results. In this paper, we investigate the benefits of an explicit alignment to language labels in Transformer-based MNMT models in the zero-shot context, by jointly training one cross attention head with word alignment supervision to stress the focus on the target language label. We compare and evaluate several MNMT systems on three multilingual MT benchmarks of different sizes, showing that simply supervising one cross attention head to focus both on word alignments and language labels reduces the bias towards translating into the wrong language, improving the zero-shot performance overall. Moreover, as an additional advantage, we find that our alignment supervision leads to more stable results across different training runs.

Raganato, A., Vázquez, R., Creutz, M., Tiedemann, J. (2021). An Empirical Investigation of Word Alignment Supervision for Zero-Shot Multilingual Neural Machine Translation. In EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp.8449-8456). Association for Computational Linguistics (ACL) [10.18653/v1/2021.emnlp-main.664].

An Empirical Investigation of Word Alignment Supervision for Zero-Shot Multilingual Neural Machine Translation

Raganato, A;Vázquez, R;Creutz, M;Tiedemann, J

2021

Abstract

Zero-shot translations is a fascinating feature of Multilingual Neural Machine Translation (MNMT) systems. These MNMT models are usually trained on English-centric data, i.e. English either as the source or target language, and with a language label prepended to the input indicating the target language. However, recent work has highlighted several flaws of these models in zero-shot scenarios where language labels are ignored and the wrong language is generated or different runs show highly unstable results. In this paper, we investigate the benefits of an explicit alignment to language labels in Transformer-based MNMT models in the zero-shot context, by jointly training one cross attention head with word alignment supervision to stress the focus on the target language label. We compare and evaluate several MNMT systems on three multilingual MT benchmarks of different sizes, showing that simply supervising one cross attention head to focus both on word alignments and language labels reduces the bias towards translating into the wrong language, improving the zero-shot performance overall. Moreover, as an additional advantage, we find that our alignment supervision leads to more stable results across different training runs.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				machine translation; Natural Language Processing; multilinguality;
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 - 7 November 2021 through 11 November 2021
			
	Anno del convegno
	
				2021
			
	Titolo degli atti
	
				EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings
			
	ISBN del volume degli atti
	
				978-195591709-4
			
	Data di pubblicazione
	
				2021
			
	Pagina iniziale
	
				8449
			
	Pagina finale
	
				8456
			
	DOI dell'intervento
	
				https://dx.doi.org/10.18653/v1/2021.emnlp-main.664
			
	Fulltext
	
				reserved
			
	Citazione
	
				Raganato, A., Vázquez, R., Creutz, M., Tiedemann, J. (2021). An Empirical Investigation of Word Alignment Supervision for Zero-Shot Multilingual Neural Machine Translation. In EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp.8449-8456). Association for Computational Linguistics (ACL) [10.18653/v1/2021.emnlp-main.664].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
EMNLP21_Raganatoetal.pdf Solo gestori archivio Dimensione 304.47 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	304.47 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/361593

Citazioni

9

3

Social impact