Bicocca Open Archive

The attention mechanism is a successful technique in modern NLP, especially in tasks like machine translation. The recently proposed network architecture of the Transformer is based entirely on attention mechanisms and achieves new state of the art results in neural machine translation, outperforming other sequence-to-sequence models. However, so far not much is known about the internal properties of the model and the representations it learns to achieve that performance. To study this question, we investigate the information that is learned by the attention mechanism in Transformer models with different translation quality. We assess the representations of the encoder by extracting dependency relations based on self-attention weights, we perform four probing tasks to study the amount of syntactic and semantic captured information and we also test attention in a transfer learning scenario. Our analysis sheds light on the relative strengths and weaknesses of the various encoder representations. We observe that specific attention heads mark syntactic dependency relations and we can also confirm that lower layers tend to learn more about syntax while higher layers tend to encode more semantics.

Raganato, A., Tiedemann, J. (2018). An Analysis of Encoder Representations in Transformer-Based Machine Translation. In EMNLP 2018 - 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Proceedings of the 1st Workshop (pp.287-297). Association for Computational Linguistics (ACL) [10.18653/v1/W18-5431].

An Analysis of Encoder Representations in Transformer-Based Machine Translation

Raganato, Alessandro;Tiedemann, Jörg

2018

Abstract

The attention mechanism is a successful technique in modern NLP, especially in tasks like machine translation. The recently proposed network architecture of the Transformer is based entirely on attention mechanisms and achieves new state of the art results in neural machine translation, outperforming other sequence-to-sequence models. However, so far not much is known about the internal properties of the model and the representations it learns to achieve that performance. To study this question, we investigate the information that is learned by the attention mechanism in Transformer models with different translation quality. We assess the representations of the encoder by extracting dependency relations based on self-attention weights, we perform four probing tasks to study the amount of syntactic and semantic captured information and we also test attention in a transfer learning scenario. Our analysis sheds light on the relative strengths and weaknesses of the various encoder representations. We observe that specific attention heads mark syntactic dependency relations and we can also confirm that lower layers tend to learn more about syntax while higher layers tend to encode more semantics.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				attention; transformer; machine translation;
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				1st Workshop on BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, co-located with the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 - 1 November
			
	Anno del convegno
	
				2018
			
	Titolo degli atti
	
				EMNLP 2018 - 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Proceedings of the 1st Workshop
			
	ISBN del volume degli atti
	
				978-1-948087-71-1
			
	Data di pubblicazione
	
				2018
			
	Pagina iniziale
	
				287
			
	Pagina finale
	
				297
			
	DOI dell'intervento
	
				https://dx.doi.org/10.18653/v1/W18-5431
			
	Fulltext
	
				reserved
			
	Citazione
	
				Raganato, A., Tiedemann, J. (2018). An Analysis of Encoder Representations in Transformer-Based Machine Translation. In EMNLP 2018 - 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Proceedings of the 1st Workshop (pp.287-297). Association for Computational Linguistics (ACL) [10.18653/v1/W18-5431].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
W18-5431.pdf Solo gestori archivio Dimensione 547.97 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	547.97 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/361565

Citazioni

204

ND

Social impact