Bicocca Open Archive

Self-supervised learning has recently gained increasing attention in computer vision, enabling the extraction of rich and general-purpose feature representations without requiring large annotated datasets. In this paper we aim to build a unified approach capable of deploying robust and effective analysis systems, replacing the need for multiple task-specific models trained end-to-end. Rather than introducing new architectures or training strategies, our goal is to systematically assess whether a single frozen self-supervised representation can support heterogeneous food-related tasks under realistic operating conditions. To this end, we performed an extensive analysis of DINOv2 features across multiple benchmark datasets and tasks, including food classification, segmentation, aesthetic assessment, and robustness to image distortions. In addition, we explore its capacity for continual learning by applying it to incremental food classification scenarios. Our findings reveal that DINOv2 features excel in many food-related applications. Their shared representations across tasks reduce the need for training separate models, while their strong generalization, high accuracy, and ability to handle complex multi-task scenarios make them a strong candidate for a unified food recognition approach. Specifically, DINOv2 features match or surpass state-of-the-art supervised methods in several food recognition tasks, while offering a simpler and more unified deployment strategy. Furthermore, they outperform end-to-end models in cross-dataset scenarios by up to +19.4% Top-1 accuracy and exhibits strong resilience to common image distortions by up to +48.0% robustness in Top-1 accuracy percentual difference, ensuring reliable performance in real-world applications. On average across all considered tasks, the DINOv2-based unified evaluation outperforms the state of the art by approximately 2.8% and 5.4%, depending on the chosen model size, while using only 6.2% and 23.9% of the total number of model parameters, respectively.

Bianco, S., Buzzelli, M., Ciocca, G., Piccoli, F., Schettini, R. (2026). A study on the generalization of DINOv2 features for food recognition tasks: A unified evaluation framework. INTELLIGENT SYSTEMS WITH APPLICATIONS, 29(March 2026) [10.1016/j.iswa.2026.200632].

A study on the generalization of DINOv2 features for food recognition tasks: A unified evaluation framework

Bianco, Simone;Buzzelli, Marco;Ciocca, Gianluigi;Piccoli, Flavio;Schettini, Raimondo

2026

Abstract

Self-supervised learning has recently gained increasing attention in computer vision, enabling the extraction of rich and general-purpose feature representations without requiring large annotated datasets. In this paper we aim to build a unified approach capable of deploying robust and effective analysis systems, replacing the need for multiple task-specific models trained end-to-end. Rather than introducing new architectures or training strategies, our goal is to systematically assess whether a single frozen self-supervised representation can support heterogeneous food-related tasks under realistic operating conditions. To this end, we performed an extensive analysis of DINOv2 features across multiple benchmark datasets and tasks, including food classification, segmentation, aesthetic assessment, and robustness to image distortions. In addition, we explore its capacity for continual learning by applying it to incremental food classification scenarios. Our findings reveal that DINOv2 features excel in many food-related applications. Their shared representations across tasks reduce the need for training separate models, while their strong generalization, high accuracy, and ability to handle complex multi-task scenarios make them a strong candidate for a unified food recognition approach. Specifically, DINOv2 features match or surpass state-of-the-art supervised methods in several food recognition tasks, while offering a simpler and more unified deployment strategy. Furthermore, they outperform end-to-end models in cross-dataset scenarios by up to +19.4% Top-1 accuracy and exhibits strong resilience to common image distortions by up to +48.0% robustness in Top-1 accuracy percentual difference, ensuring reliable performance in real-world applications. On average across all considered tasks, the DINOv2-based unified evaluation outperforms the state of the art by approximately 2.8% and 5.4%, depending on the chosen model size, while using only 6.2% and 23.9% of the total number of model parameters, respectively.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Aesthetic assessment; Continual learning; Cross-domain adaptation; Food recognition; Food segmentation; Semi-supervised learning;
			
	Lingua del contenuto
	
				English
			
	Data ahead of print o Data prima pubblicazione Online
	
				4-feb-2026
			
	Data di pubblicazione
	
				2026
			
	Rivista
	
				INTELLIGENT SYSTEMS WITH APPLICATIONS
			
	Numero del volume
	
				29
			
	Fascicolo
	
				March 2026
			
	Article number
	
				200632
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1016/j.iswa.2026.200632
			
	Fulltext
	
				open
			
	Citazione
	
				Bianco, S., Buzzelli, M., Ciocca, G., Piccoli, F., Schettini, R. (2026). A study on the generalization of DINOv2 features for food recognition tasks: A unified evaluation framework. INTELLIGENT SYSTEMS WITH APPLICATIONS, 29(March 2026) [10.1016/j.iswa.2026.200632].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
Bianco et al-2026-Intelligent Systems with Applications-VoR.pdf accesso aperto Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 4.06 MB Formato Adobe PDF Visualizza/Apri	4.06 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/590182

Citazioni

2

2

Social impact