The attention mechanism is a successful technique in modern NLP, especially in tasks like machine translation. The recently proposed network architecture of the Transformer is based entirely on attention mechanisms and achieves new state of the art results in neural machine translation, outperforming other sequence-to-sequence models. However, so far not much is known about the internal properties of the model and the representations it learns to achieve that performance. To study this question, we investigate the information that is learned by the attention mechanism in Transformer models with different translation quality. We assess the representations of the encoder by extracting dependency relations based on self-attention weights, we perform four probing tasks to study the amount of syntactic and semantic captured information and we also test attention in a transfer learning scenario. Our analysis sheds light on the relative strengths and weaknesses of the various encoder representations. We observe that specific attention heads mark syntactic dependency relations and we can also confirm that lower layers tend to learn more about syntax while higher layers tend to encode more semantics.

Raganato, A., Tiedemann, J. (2018). An Analysis of Encoder Representations in Transformer-Based Machine Translation. In EMNLP 2018 - 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Proceedings of the 1st Workshop (pp.287-297). Association for Computational Linguistics (ACL) [10.18653/v1/W18-5431].

An Analysis of Encoder Representations in Transformer-Based Machine Translation

Raganato, Alessandro
;
2018

Abstract

The attention mechanism is a successful technique in modern NLP, especially in tasks like machine translation. The recently proposed network architecture of the Transformer is based entirely on attention mechanisms and achieves new state of the art results in neural machine translation, outperforming other sequence-to-sequence models. However, so far not much is known about the internal properties of the model and the representations it learns to achieve that performance. To study this question, we investigate the information that is learned by the attention mechanism in Transformer models with different translation quality. We assess the representations of the encoder by extracting dependency relations based on self-attention weights, we perform four probing tasks to study the amount of syntactic and semantic captured information and we also test attention in a transfer learning scenario. Our analysis sheds light on the relative strengths and weaknesses of the various encoder representations. We observe that specific attention heads mark syntactic dependency relations and we can also confirm that lower layers tend to learn more about syntax while higher layers tend to encode more semantics.
paper
attention; transformer; machine translation;
English
1st Workshop on BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, co-located with the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 - 1 November
2018
EMNLP 2018 - 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Proceedings of the 1st Workshop
978-1-948087-71-1
2018
287
297
reserved
Raganato, A., Tiedemann, J. (2018). An Analysis of Encoder Representations in Transformer-Based Machine Translation. In EMNLP 2018 - 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Proceedings of the 1st Workshop (pp.287-297). Association for Computational Linguistics (ACL) [10.18653/v1/W18-5431].
File in questo prodotto:
File Dimensione Formato  
W18-5431.pdf

Solo gestori archivio

Dimensione 547.97 kB
Formato Adobe PDF
547.97 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/361565
Citazioni
  • Scopus 175
  • ???jsp.display-item.citation.isi??? ND
Social impact