Bicocca Open Archive

Sparse AutoEncoders (SAEs) have recently been employed as a promising unsupervised approach for understanding the representations of layers of Large Language Models (LLMs). However, with the growth in model size and complexity, training SAEs is computationally intensive, as typically one SAE is trained for each model layer. To address such limitation, we propose Group-SAE, a novel strategy to train SAEs. Our method considers the similarity of the residual stream representations between contiguous layers to group similar layers and train a single SAE per group. To balance the trade-off between efficiency and performance, we further introduce AMAD (Average Maximum Angular Distance), an empirical metric that guides the selection of an optimal number of groups based on representational similarity across layers. Experiments on models from the Pythia family show that our approach significantly accelerates training with minimal impact on reconstruction quality and comparable downstream task performance and interpretability over baseline SAEs trained layer by layer. This method provides an efficient and scalable strategy for training SAEs in modern LLMs.

Ghilardi, D., Belotti, F., Molinari, M., Ma, T., Palmonari, M. (2025). Group-SAE: Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (pp.18668-18688) [10.18653/v1/2025.emnlp-main.942].

Group-SAE: Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups

Ghilardi, Davide;Belotti, Federico;Molinari, Marco;Ma, Tao;Palmonari, Matteo

2025

Abstract

Sparse AutoEncoders (SAEs) have recently been employed as a promising unsupervised approach for understanding the representations of layers of Large Language Models (LLMs). However, with the growth in model size and complexity, training SAEs is computationally intensive, as typically one SAE is trained for each model layer. To address such limitation, we propose Group-SAE, a novel strategy to train SAEs. Our method considers the similarity of the residual stream representations between contiguous layers to group similar layers and train a single SAE per group. To balance the trade-off between efficiency and performance, we further introduce AMAD (Average Maximum Angular Distance), an empirical metric that guides the selection of an optimal number of groups based on representational similarity across layers. Experiments on models from the Pythia family show that our approach significantly accelerates training with minimal impact on reconstruction quality and comparable downstream task performance and interpretability over baseline SAEs trained layer by layer. This method provides an efficient and scalable strategy for training SAEs in modern LLMs.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				Artificial Intelligence, Large Language Models, interpretability, efficient ML
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				The 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025) - November 4-9, 2025
			
	Anno del convegno
	
				2025
			
	Curatori della monografia
	
				Christodoulopoulos, C; Chakraborty, T; Rose, C; Peng, V
			
	Titolo degli atti
	
				Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
			
	ISBN del volume degli atti
	
				9798891763326
			
	Data di pubblicazione
	
				2025
			
	Pagina iniziale
	
				18668
			
	Pagina finale
	
				18688
			
	DOI dell'intervento
	
				https://dx.doi.org/10.18653/v1/2025.emnlp-main.942
			
	Fulltext
	
				open
			
	Citazione
	
				Ghilardi, D., Belotti, F., Molinari, M., Ma, T., Palmonari, M. (2025). Group-SAE: Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (pp.18668-18688) [10.18653/v1/2025.emnlp-main.942].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
Ghilardi-2025-EMNLP 2025-VoR.pdf accesso aperto Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 2.7 MB Formato Adobe PDF Visualizza/Apri	2.7 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/583203

Citazioni

ND

ND

Social impact