Bicocca Open Archive

Adapting pre-trained Large Language Models to specific tasks has traditionally involved updating all of their parameters. Nonetheless, this technique becomes impractical for models containing billions of parameters. This has led to intensive research on Parameter-Efficient Fine-Tuning (PEFT) techniques, which aim to train a small fraction of the model's parameters while maintaining comparable performance to Full Fine-Tuning. A popular method is the Adapter, i.e. small trainable layers added to pre-trained models. Recently, we present AdaKron, an Adapter-based PEFT technique, which leverages the Kronecker product to combine the outputs of two small networks, training less than 0.55 % of the model's parameters while outperforming Full Fine-Tuning. In this paper, we put forward a novel technique, called MAdaKron, a Mixture-of-AdaKron model, which combines AdaKron with a Mixture of Experts approach. MAdaKron combines the flexibility of a Mixture of Experts architecture with the efficiency given by AdaKron to further enhance its performance. We then extensively evaluate MAdaKron on eighteen Natural Language Understanding and Generation benchmarks, showing that it achieves performance on par or even better than recent state-of-the-art PEFT methods, while reducing the number of trainable parameters. These findings highlight MAdaKron as an efficient solution for Fine-Tuning LLMs, offering substantial computational cost reductions without losing performance.

Braga, M., Raganato, A., Pasi, G. (2026). MAdaKron: A mixture-of-AdaKron adapters. KNOWLEDGE-BASED SYSTEMS, 334(15 February 2026) [10.1016/j.knosys.2025.115086].

MAdaKron: A mixture-of-AdaKron adapters

Braga M.;Raganato A.;Pasi G.

2026

Abstract

Adapting pre-trained Large Language Models to specific tasks has traditionally involved updating all of their parameters. Nonetheless, this technique becomes impractical for models containing billions of parameters. This has led to intensive research on Parameter-Efficient Fine-Tuning (PEFT) techniques, which aim to train a small fraction of the model's parameters while maintaining comparable performance to Full Fine-Tuning. A popular method is the Adapter, i.e. small trainable layers added to pre-trained models. Recently, we present AdaKron, an Adapter-based PEFT technique, which leverages the Kronecker product to combine the outputs of two small networks, training less than 0.55 % of the model's parameters while outperforming Full Fine-Tuning. In this paper, we put forward a novel technique, called MAdaKron, a Mixture-of-AdaKron model, which combines AdaKron with a Mixture of Experts approach. MAdaKron combines the flexibility of a Mixture of Experts architecture with the efficiency given by AdaKron to further enhance its performance. We then extensively evaluate MAdaKron on eighteen Natural Language Understanding and Generation benchmarks, showing that it achieves performance on par or even better than recent state-of-the-art PEFT methods, while reducing the number of trainable parameters. These findings highlight MAdaKron as an efficient solution for Fine-Tuning LLMs, offering substantial computational cost reductions without losing performance.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Adapters; Large language models; Mixture of experts;
			
	Lingua del contenuto
	
				English
			
	Data ahead of print o Data prima pubblicazione Online
	
				6-dic-2025
			
	Data di pubblicazione
	
				2026
			
	Rivista
	
				KNOWLEDGE-BASED SYSTEMS
			
	Numero del volume
	
				334
			
	Fascicolo
	
				15 February 2026
			
	Article number
	
				115086
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1016/j.knosys.2025.115086
			
	Fulltext
	
				open
			
	Citazione
	
				Braga, M., Raganato, A., Pasi, G. (2026). MAdaKron: A mixture-of-AdaKron adapters. KNOWLEDGE-BASED SYSTEMS, 334(15 February 2026) [10.1016/j.knosys.2025.115086].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
Braga-2026-Knowledge Based Sys-VoR.pdf accesso aperto Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 4.28 MB Formato Adobe PDF Visualizza/Apri	4.28 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/588398

Citazioni

1

0

Social impact