Bicocca Open Archive

Neural machine translation has considerably improved the quality of automatic translations by learning good representations of input sentences. In this article, we explore a multilingual translation model capable of producing fixed-size sentence representations by incorporating an intermediate crosslingual shared layer, which we refer to as attention bridge. This layer exploits the semantics from each language and develops into a language-agnostic meaning representation that can be efficiently used for transfer learning. We systematically study the impact of the size of the attention bridge and the effect of including additional languages in the model. In contrast to related previous work, we demonstrate that there is no conflict between translation performance and the use of sentence representations in downstream tasks. In particular, we show that larger intermediate layers not only improve translation quality, especially for long sentences, but also push the accuracy of trainable classification tasks. Nevertheless, shorter representations lead to increased compression that is beneficial in non-trainable similarity tasks. Similarly, we show that trainable downstream tasks benefit from multilingual models, whereas additional language signals do not improve performance in non-trainable benchmarks. This is an important insight that helps to properly design models for specific applications. Finally, we also include an in-depth analysis of the proposed attention bridge and its ability to encode linguistic properties. We carefully analyze the information that is captured by individual attention heads and identify interesting patterns that explain the performance of specific settings in linguistic probing tasks.

Vazquez, R., Raganato, A., Creutz, M., Tiedemann, J. (2020). A systematic study of inner-attention-based sentence representations in multilingual neural machine translation. COMPUTATIONAL LINGUISTICS, 46(2), 387-424 [10.1162/COLI_a_00377].

A systematic study of inner-attention-based sentence representations in multilingual neural machine translation

Vazquez R.;Raganato A.;Creutz M.;Tiedemann J.

2020

Abstract

Neural machine translation has considerably improved the quality of automatic translations by learning good representations of input sentences. In this article, we explore a multilingual translation model capable of producing fixed-size sentence representations by incorporating an intermediate crosslingual shared layer, which we refer to as attention bridge. This layer exploits the semantics from each language and develops into a language-agnostic meaning representation that can be efficiently used for transfer learning. We systematically study the impact of the size of the attention bridge and the effect of including additional languages in the model. In contrast to related previous work, we demonstrate that there is no conflict between translation performance and the use of sentence representations in downstream tasks. In particular, we show that larger intermediate layers not only improve translation quality, especially for long sentences, but also push the accuracy of trainable classification tasks. Nevertheless, shorter representations lead to increased compression that is beneficial in non-trainable similarity tasks. Similarly, we show that trainable downstream tasks benefit from multilingual models, whereas additional language signals do not improve performance in non-trainable benchmarks. This is an important insight that helps to properly design models for specific applications. Finally, we also include an in-depth analysis of the proposed attention bridge and its ability to encode linguistic properties. We carefully analyze the information that is captured by individual attention heads and identify interesting patterns that explain the performance of specific settings in linguistic probing tasks.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				machine translation; multilinguality; sentence representation; deep learning;
			
	Lingua del contenuto
	
				English
			
	Data di pubblicazione
	
				2020
			
	Rivista
	
				COMPUTATIONAL LINGUISTICS
			
	Numero del volume
	
				46
			
	Fascicolo
	
				2
			
	Pagina iniziale
	
				387
			
	Pagina finale
	
				424
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1162/COLI_a_00377
			
	Fulltext
	
				partially_open
			
	Citazione
	
				Vazquez, R., Raganato, A., Creutz, M., Tiedemann, J. (2020). A systematic study of inner-attention-based sentence representations in multilingual neural machine translation. COMPUTATIONAL LINGUISTICS, 46(2), 387-424 [10.1162/COLI_a_00377].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
2020.cl-2.5.pdf Solo gestori archivio Dimensione 781.42 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	781.42 kB	Adobe PDF	Visualizza/Apri Richiedi una copia
10281-361577_VoR.pdf accesso aperto Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 820.51 kB Formato Adobe PDF Visualizza/Apri	820.51 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/361577

Citazioni

15

7

Social impact