Bicocca Open Archive

Automated complexity estimation can be used for efficiently analyzing large image datasets, improving image compression, and enhancing tasks like image recognition, segmentation, and crowd counting. However, traditional methods often lack integration flexibility for broader applications as image complexity estimation is carried out with single-use, ad-hoc, models. To mitigate this problem, we propose to exploit the DINO-v2 self-supervised learning vision transformer model with great generalization capabilities, as a backbone for image feature extraction. This is the first work that leverage and evaluate different features extracted from a foundation model for image complexity estimation. Here, we study two kind of features that can be leveraged in different tasks: global features (context information) in the form of a single complexity score, and local features (detailed information) in the form of a pixel-wise complexity map. The features are extracted from separate branches specifically incorporated into the model. In model training, we demonstrate that a criterion based on linear and rank correlation between predictions and labels outperforms the more commonly used MSE. Our model performs comparably to state-of-the-art methods in both intra- and cross-dataset experiments. By using a pre-trained model we simplified the training process of our method and we demonstrate that the general-purpose features of DINO-v2 can be effectively used for complexity estimation.

Celona, L., Ciocca, G., Schettini, R. (2026). Leveraging foundation model DINO-v2 for image complexity estimation. NEURAL COMPUTING & APPLICATIONS, 38(1) [10.1007/s00521-025-11786-2].

Leveraging foundation model DINO-v2 for image complexity estimation

Celona, Luigi;Ciocca, Gianluigi;Schettini, Raimondo

2026

Abstract

Automated complexity estimation can be used for efficiently analyzing large image datasets, improving image compression, and enhancing tasks like image recognition, segmentation, and crowd counting. However, traditional methods often lack integration flexibility for broader applications as image complexity estimation is carried out with single-use, ad-hoc, models. To mitigate this problem, we propose to exploit the DINO-v2 self-supervised learning vision transformer model with great generalization capabilities, as a backbone for image feature extraction. This is the first work that leverage and evaluate different features extracted from a foundation model for image complexity estimation. Here, we study two kind of features that can be leveraged in different tasks: global features (context information) in the form of a single complexity score, and local features (detailed information) in the form of a pixel-wise complexity map. The features are extracted from separate branches specifically incorporated into the model. In model training, we demonstrate that a criterion based on linear and rank correlation between predictions and labels outperforms the more commonly used MSE. Our model performs comparably to state-of-the-art methods in both intra- and cross-dataset experiments. By using a pre-trained model we simplified the training process of our method and we demonstrate that the general-purpose features of DINO-v2 can be effectively used for complexity estimation.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Feature extraction; Foundation models; Image complexity estimation;
			
	Lingua del contenuto
	
				English
			
	Data ahead of print o Data prima pubblicazione Online
	
				20-gen-2026
			
	Data di pubblicazione
	
				2026
			
	Rivista
	
				NEURAL COMPUTING & APPLICATIONS
			
	Numero del volume
	
				38
			
	Fascicolo
	
				1
			
	Article number
	
				10
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1007/s00521-025-11786-2
			
	Fulltext
	
				none
			
	Citazione
	
				Celona, L., Ciocca, G., Schettini, R. (2026). Leveraging foundation model DINO-v2 for image complexity estimation. NEURAL COMPUTING & APPLICATIONS, 38(1) [10.1007/s00521-025-11786-2].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/584221

Citazioni

0

ND

Social impact