Bicocca Open Archive

Transformer-based models have brought a radical change to neural machine translation. A key feature of the Transformer architecture is the so-called multi-head attention mechanism, which allows the model to focus simultaneously on different parts of the input. However, recent works have shown that most attention heads learn simple, and often redundant, positional patterns. In this paper, we propose to replace all but one attention head of each encoder layer with simple fixed – non-learnable – attentive patterns that are solely based on position and do not require any external knowledge. Our experiments with different data sizes and multiple language pairs show that fixing the attention heads on the encoder side of the Transformer at training time does not impact the translation quality and even increases BLEU scores by up to 3 points in low-resource scenarios.

Raganato, A., Scherrer, Y., Tiedemann, J. (2020). Fixed encoder self-attention patterns in transformer-based machine translation. In Findings of the Association for Computational Linguistics: EMNLP 2020 (pp.556-568). Association for Computational Linguistics (ACL) [10.18653/v1/2020.findings-emnlp.49].

Fixed encoder self-attention patterns in transformer-based machine translation

Raganato, A;Scherrer, Y;Tiedemann, J

2020

Abstract

Transformer-based models have brought a radical change to neural machine translation. A key feature of the Transformer architecture is the so-called multi-head attention mechanism, which allows the model to focus simultaneously on different parts of the input. However, recent works have shown that most attention heads learn simple, and often redundant, positional patterns. In this paper, we propose to replace all but one attention head of each encoder layer with simple fixed – non-learnable – attentive patterns that are solely based on position and do not require any external knowledge. Our experiments with different data sizes and multiple language pairs show that fixing the attention heads on the encoder side of the Transformer at training time does not impact the translation quality and even increases BLEU scores by up to 3 points in low-resource scenarios.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				transformer; machine translation; attention
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				Findings of the Association for Computational Linguistics: EMNLP 2020
			
	Anno del convegno
	
				2020
			
	Titolo degli atti
	
				Findings of the Association for Computational Linguistics: EMNLP 2020
			
	ISBN del volume degli atti
	
				978-1-952148-90-3
			
	Data di pubblicazione
	
				2020
			
	Pagina iniziale
	
				556
			
	Pagina finale
	
				568
			
	DOI dell'intervento
	
				https://dx.doi.org/10.18653/v1/2020.findings-emnlp.49
			
	Fulltext
	
				reserved
			
	Citazione
	
				Raganato, A., Scherrer, Y., Tiedemann, J. (2020). Fixed encoder self-attention patterns in transformer-based machine translation. In Findings of the Association for Computational Linguistics: EMNLP 2020 (pp.556-568). Association for Computational Linguistics (ACL) [10.18653/v1/2020.findings-emnlp.49].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
2020.findings-emnlp.49.pdf Solo gestori archivio Dimensione 1.2 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.2 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/361583

Citazioni

42

33

Social impact