Bicocca Open Archive

Large language models (LLMs) have been proposed as candidate models of human semantics, and as such, they must be able to account for conceptual combination. This work explores the ability of two LLMs, namely, BERT-base and Llama-2-13b, to reveal the implicit meaning of existing and novel compound words. According to psycholinguistic theories, understanding the meaning of a compound (e.g., “snowman”) involves its automatic decomposition into constituent meanings (“snow,” “man”), which are then connected by an implicit semantic relation selected from a set of possible competitors (FOR, MADE OF, BY, …) to obtain a plausible interpretation (“man MADE OF snow”). Here, we leverage the flexibility of LLMs to obtain contextualized representations for both target compounds (e.g., “snowman”) and their implicit interpretations (e.g., “man MADE OF snow”). We demonstrate that replacing a compound with a paraphrased version leads to changes to the embeddings that are inversely proportional to the paraphrase's plausibility, estimated by human raters. While this relation holds for both existing and novel compounds, results obtained for novel compounds are substantially weaker, and older distributional models outperform LLMs. Nonetheless, the present results show that LLMs can offer a valid approximation of the internal structure of compound words posited by cognitive theories, thus representing a promising tool to model word senses that are at once implicit and possible.

Ciapparelli, M., Zarbo, C., Marelli, M. (2025). Conceptual Combination in Large Language Models: Uncovering Implicit Relational Interpretations in Compound Words With Contextualized Word Embeddings. COGNITIVE SCIENCE, 49(3) [10.1111/cogs.70048].

Conceptual Combination in Large Language Models: Uncovering Implicit Relational Interpretations in Compound Words With Contextualized Word Embeddings

Ciapparelli, Marco;Zarbo, Calogero;Marelli, Marco

2025

Abstract

Large language models (LLMs) have been proposed as candidate models of human semantics, and as such, they must be able to account for conceptual combination. This work explores the ability of two LLMs, namely, BERT-base and Llama-2-13b, to reveal the implicit meaning of existing and novel compound words. According to psycholinguistic theories, understanding the meaning of a compound (e.g., “snowman”) involves its automatic decomposition into constituent meanings (“snow,” “man”), which are then connected by an implicit semantic relation selected from a set of possible competitors (FOR, MADE OF, BY, …) to obtain a plausible interpretation (“man MADE OF snow”). Here, we leverage the flexibility of LLMs to obtain contextualized representations for both target compounds (e.g., “snowman”) and their implicit interpretations (e.g., “man MADE OF snow”). We demonstrate that replacing a compound with a paraphrased version leads to changes to the embeddings that are inversely proportional to the paraphrase's plausibility, estimated by human raters. While this relation holds for both existing and novel compounds, results obtained for novel compounds are substantially weaker, and older distributional models outperform LLMs. Nonetheless, the present results show that LLMs can offer a valid approximation of the internal structure of compound words posited by cognitive theories, thus representing a promising tool to model word senses that are at once implicit and possible.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Compound words; Computational modeling; Conceptual combination; Contextualized word embeddings; Large language models; Psycholinguistics;
			
	Lingua del contenuto
	
				English
			
	Data ahead of print o Data prima pubblicazione Online
	
				13-mar-2025
			
	Data di pubblicazione
	
				2025
			
	Rivista
	
				COGNITIVE SCIENCE
			
	Numero del volume
	
				49
			
	Fascicolo
	
				3
			
	Article number
	
				e70048
			
	DOI dell'articolo
	
				https://dx.doi.org/10.1111/cogs.70048
			
	Fulltext
	
				reserved
			
	Citazione
	
				Ciapparelli, M., Zarbo, C., Marelli, M. (2025). Conceptual Combination in Large Language Models: Uncovering Implicit Relational Interpretations in Compound Words With Contextualized Word Embeddings. COGNITIVE SCIENCE, 49(3) [10.1111/cogs.70048].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

File	Dimensione	Formato
Ciapparelli-2025-Cognitive Sci-VoR.pdf Solo gestori archivio Descrizione: No accepted version policy in place Tipologia di allegato: Author’s Accepted Manuscript, AAM (Post-print) Licenza: Tutti i diritti riservati Dimensione 2.88 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2.88 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/560627

Citazioni

3

2

Social impact