Bicocca Open Archive

Activation steering methods were shown to be effective in conditioning language model generation by additively intervening over models’ intermediate representations. However, the evaluation of these techniques has so far been limited to single conditioning properties and synthetic settings. In this work, we conduct a comprehensive evaluation of various activation steering strategies, highlighting the property-dependent nature of optimal parameters to ensure a robust effect throughout generation. To address this issue, we propose Dynamic Activation Composition, an information-theoretic approach to modulate the steering intensity of one or more properties throughout generation. Our experiments on multi-property steering show that our method successfully maintains high conditioning while minimizing the impact of conditioning on generation fluency.

Scalena, D., Sarti, G., Nissim, M. (2024). Multi-property Steering of Large Language Models with Dynamic Activation Composition. In BlackboxNLP 2024 - 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP - Proceedings of the Workshop (pp.577-603). Association for Computational Linguistics (ACL) [10.18653/v1/2024.blackboxnlp-1.34].

Multi-property Steering of Large Language Models with Dynamic Activation Composition

Scalena D.;Sarti G.;Nissim M.

2024

Abstract

Activation steering methods were shown to be effective in conditioning language model generation by additively intervening over models’ intermediate representations. However, the evaluation of these techniques has so far been limited to single conditioning properties and synthetic settings. In this work, we conduct a comprehensive evaluation of various activation steering strategies, highlighting the property-dependent nature of optimal parameters to ensure a robust effect throughout generation. To address this issue, we propose Dynamic Activation Composition, an information-theoretic approach to modulate the steering intensity of one or more properties throughout generation. Our experiments on multi-property steering show that our method successfully maintains high conditioning while minimizing the impact of conditioning on generation fluency.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di intervento
	
				paper
			
	Parole chiave
	
				Activation steering Language model conditioning Intermediate representation intervention Dynamic Activation Composition Multi-property steering Information-theoretic approach Generation fluency
			
	Lingua del contenuto
	
				English
			
	Nome del convegno
	
				7th BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP 2024 - 15 November 2024
			
	Anno del convegno
	
				2024
			
	Curatori della monografia
	
				Belinkov, Y; Kim, N; Jumelet, J; Mohebbi, H; Mueller, A; Chen, H
			
	Titolo degli atti
	
				BlackboxNLP 2024 - 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP - Proceedings of the Workshop
			
	ISBN del volume degli atti
	
				9798891761704
			
	Data di pubblicazione
	
				2024
			
	Pagina iniziale
	
				577
			
	Pagina finale
	
				603
			
	DOI dell'intervento
	
				https://dx.doi.org/10.18653/v1/2024.blackboxnlp-1.34
			
	URL alternativo
	
				https://aclanthology.org/2024.blackboxnlp-1.34/
			
	Fulltext
	
				open
			
	Citazione
	
				Scalena, D., Sarti, G., Nissim, M. (2024). Multi-property Steering of Large Language Models with Dynamic Activation Composition. In BlackboxNLP 2024 - 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP - Proceedings of the Workshop (pp.577-603). Association for Computational Linguistics (ACL) [10.18653/v1/2024.blackboxnlp-1.34].
			
	Appare nelle tipologie:
	
				02 - Intervento a convegno

File in questo prodotto:

File	Dimensione	Formato
Scalena-2024-BlackboxNLP-VoR.pdf accesso aperto Descrizione: Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License. Tipologia di allegato: Publisher’s Version (Version of Record, VoR) Licenza: Creative Commons Dimensione 1.69 MB Formato Adobe PDF Visualizza/Apri	1.69 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/549141

Citazioni

0

ND

Social impact