Bicocca Open Archive

Food recognition is a major challenge in the field of computer vision, requiring models that can effectively handle the wide variability and complexity of food images. In this paper, we explore the use of vision transformers, a category of models based on self-attention mechanisms, to address the task of food recognition. We focus on training and fine-tuning different vision transformer architectures on Food2K, a large-scale dataset of food images with 2,000 categories. We compare the performance of vision transformers with convolutional neural networks (CNNs) on Food2K and Food101. In addition, we use state-of-the-art explainability techniques to highlight the regions of interest that vision transformers take into account when performing a prediction. Our results show that vision transformers can achieve competitive results on food recognition tasks, with the added benefit that pre-training on Food2K improve their generalization capabilities and interpretability. This study highlights the potential of vision transformers in food computing, paving the way for future research in this field.

Bianco, S., Buzzelli, M., Chiriaco, G., Napoletano, P., Piccoli, F. (2023). Food Recognition with Visual Transformers. In 2023 IEEE 13th International Conference on Consumer Electronics - Berlin (ICCE-Berlin) (pp.82-87). IEEE [10.1109/ICCE-Berlin58801.2023.10375660].

Food Recognition with Visual Transformers

Bianco, Simone;Buzzelli, Marco;Chiriaco, Gaetano;Napoletano, Paolo;Piccoli, Flavio

2023

Abstract

Food recognition is a major challenge in the field of computer vision, requiring models that can effectively handle the wide variability and complexity of food images. In this paper, we explore the use of vision transformers, a category of models based on self-attention mechanisms, to address the task of food recognition. We focus on training and fine-tuning different vision transformer architectures on Food2K, a large-scale dataset of food images with 2,000 categories. We compare the performance of vision transformers with convolutional neural networks (CNNs) on Food2K and Food101. In addition, we use state-of-the-art explainability techniques to highlight the regions of interest that vision transformers take into account when performing a prediction. Our results show that vision transformers can achieve competitive results on food recognition tasks, with the added benefit that pre-training on Food2K improve their generalization capabilities and interpretability. This study highlights the potential of vision transformers in food computing, paving the way for future research in this field.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				slide + paper
			
	Parole chiave
	
				CNNs; food recognition; visual transformers; ViT;
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				2023 IEEE 13th International Conference on Consumer Electronics - Berlin (ICCE-Berlin) - 03-05 September 2023
			
	Anno del convegno
	
				2023
			
	Titolo degli atti
	
				2023 IEEE 13th International Conference on Consumer Electronics - Berlin (ICCE-Berlin)
			
	ISBN del volume degli atti
	
				9798350324150
			
	Data di pubblicazione
	
				2023
			
	Pagina iniziale
	
				82
			
	Pagina finale
	
				87
			
	DOI dell'intervento
	
				https://dx.doi.org/10.1109/ICCE-Berlin58801.2023.10375660
			
	URL alternativo
	
				https://ieeexplore.ieee.org/document/10375660
			
	Fulltext
	
				reserved
			
	Citazione
	
				Bianco, S., Buzzelli, M., Chiriaco, G., Napoletano, P., Piccoli, F. (2023). Food Recognition with Visual Transformers. In 2023 IEEE 13th International Conference on Consumer Electronics - Berlin (ICCE-Berlin) (pp.82-87). IEEE [10.1109/ICCE-Berlin58801.2023.10375660].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
Bianco-2023-ICCE Berlin-VoR.pdf Solo gestori archivio Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Tutti i diritti riservati Dimensione 387.08 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	387.08 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/456872

Citazioni

13

ND

Social impact