Bicocca Open Archive

The fast development of novel approaches derived from the Transformers architecture has led to outstanding performance in different scenarios, from Natural Language Processing to Computer Vision. Recently, they achieved impressive results even in the challenging task of non-rigid shape matching. However, little is known about the capability of the Transformer-encoder architecture for the shape matching task, and its performances still remained largely unexplored. In this paper, we step back and investigate the contribution made by the Transformer-encoder architecture compared to its more recent alternatives, focusing on why and how it works on this specific task. Thanks to the versatility of our implementation, we can harness the bi-directional structure of the correspondence problem, making it more interpretable. Furthermore, we prove that positional encodings are essential for processing unordered point clouds. Through a comprehensive set of experiments, we find that attention and positional encoding are (almost) all you need for shape matching. The simple Transformer-encoder architecture, coupled with relative position encoding in the attention mechanism, is able to obtain strong improvements, reaching the current state-of-the-art.

Raganato, A., Pasi, G., Melzi, S. (2023). Attention And Positional Encoding Are (Almost) All You Need For Shape Matching. COMPUTER GRAPHICS FORUM, 42(5 (August 2023)) [10.1111/cgf.14912].

Attention And Positional Encoding Are (Almost) All You Need For Shape Matching

Raganato, Alessandro;Pasi, Gabriella;Melzi, Simone

2023

Abstract

The fast development of novel approaches derived from the Transformers architecture has led to outstanding performance in different scenarios, from Natural Language Processing to Computer Vision. Recently, they achieved impressive results even in the challenging task of non-rigid shape matching. However, little is known about the capability of the Transformer-encoder architecture for the shape matching task, and its performances still remained largely unexplored. In this paper, we step back and investigate the contribution made by the Transformer-encoder architecture compared to its more recent alternatives, focusing on why and how it works on this specific task. Thanks to the versatility of our implementation, we can harness the bi-directional structure of the correspondence problem, making it more interpretable. Furthermore, we prove that positional encodings are essential for processing unordered point clouds. Through a comprehensive set of experiments, we find that attention and positional encoding are (almost) all you need for shape matching. The simple Transformer-encoder architecture, coupled with relative position encoding in the attention mechanism, is able to obtain strong improvements, reaching the current state-of-the-art.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				CCS Concepts; • Computing methodologies; • Theory of computation; → Computational geometry; → Shape analysis;
			
	Lingua del contenuto
	
				English
			
	Data ahead of print o Data prima pubblicazione Online
	
				10-ago-2023
			
	Data di pubblicazione
	
				2023
			
	Rivista
	
				COMPUTER GRAPHICS FORUM
			
	Numero del volume
	
				42
			
	Fascicolo
	
				5 (August 2023)
			
	Article number
	
				e14912
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1111/cgf.14912
			
	Fulltext
	
				open
			
	Citazione
	
				Raganato, A., Pasi, G., Melzi, S. (2023). Attention And Positional Encoding Are (Almost) All You Need For Shape Matching. COMPUTER GRAPHICS FORUM, 42(5 (August 2023)) [10.1111/cgf.14912].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
10281-434459_VoR.pdf accesso aperto Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 13.23 MB Formato Adobe PDF Visualizza/Apri	13.23 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/434459

Citazioni

8

4

Social impact