The influence of personas on Large Language Models (LLMs) has been widely studied, yet their direct impact on performance remains uncertain. This work explores a novel approach to guiding LLM behaviour through role vectors, an alternative to persona-based prompting. We construct 29 role vectors derived from model activations and evaluate their impact on benchmark performance across multiple domains. Our analysis investigates whether these vectors can effectively steer models toward domain-specific expertise. We measure two key interventions: (i) activation addition, which reinforces role-specific directions, and (ii) directional ablation, which removes them. Results on well-established benchmarks indicate that role vectors do, in fact, influence model behaviour, improving in-domain task performance while also yielding unexpected cross-domain gains.This, in turn, suggests that manipulating internal model representations has a greater impact on outcomes than persona-based prompting.
Potertì, D., Seveso, A., Mercorio, F. (2025). Can Role Vectors Affect LLM Behaviour?. In Findings of the Association for Computational Linguistics: EMNLP 2025 (pp.17735-17747). Association for Computational Linguistics.
Can Role Vectors Affect LLM Behaviour?
Potertì, D;Seveso, A;Mercorio, F
2025
Abstract
The influence of personas on Large Language Models (LLMs) has been widely studied, yet their direct impact on performance remains uncertain. This work explores a novel approach to guiding LLM behaviour through role vectors, an alternative to persona-based prompting. We construct 29 role vectors derived from model activations and evaluate their impact on benchmark performance across multiple domains. Our analysis investigates whether these vectors can effectively steer models toward domain-specific expertise. We measure two key interventions: (i) activation addition, which reinforces role-specific directions, and (ii) directional ablation, which removes them. Results on well-established benchmarks indicate that role vectors do, in fact, influence model behaviour, improving in-domain task performance while also yielding unexpected cross-domain gains.This, in turn, suggests that manipulating internal model representations has a greater impact on outcomes than persona-based prompting.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


