Bicocca Open Archive

Despite being largely spoken and studied by language and cognitive scientists, Italian lacks large resources of language processing data. The Italian Crowdsourcing Project (ICP) is a dataset of word recognition times and accuracy including responses to 130,465 words, which makes it the largest dataset of its kind item-wise. The data were collected in an online word knowledge task in which over 156,000 native speakers of Italian took part. We validated the ICP dataset by (1) showing that ICP reaction times correlate strongly (r =.78) with lexical decision latencies collected in a traditional lab experiment, (2) showing that the effect of major psycholinguistic variables (e.g., frequency, length, etc.) can be replicated in this dataset, and (3) replicating the effect of word prevalence, which we compute here for the first time for Italian. Given the inclusion of many inflectional forms of verbs, adjectives, and nouns, we further showcase the potential of this dataset by exploring two phenomena (inflectional entropy in verb paradigms and the clitic effect in isolated word recognition) that build on the peculiar properties of Italian. In this paper we present the ICP resource and release response times, accuracy, and prevalence estimates for all the words included.

Amenta, S., de Varda, A., Mandera, P., Keuleers, E., Brysbaert, M., Marelli, M. (2025). The Italian Crowdsourcing Project: Visual word recognition times for 130,495 Italian words. BEHAVIOR RESEARCH METHODS, 57(January 2025) [10.3758/s13428-024-02548-4].

The Italian Crowdsourcing Project: Visual word recognition times for 130,495 Italian words

Amenta S.;de Varda A. G.;Mandera P.;Keuleers E.;Brysbaert M.;Marelli M.

2025

Abstract

Despite being largely spoken and studied by language and cognitive scientists, Italian lacks large resources of language processing data. The Italian Crowdsourcing Project (ICP) is a dataset of word recognition times and accuracy including responses to 130,465 words, which makes it the largest dataset of its kind item-wise. The data were collected in an online word knowledge task in which over 156,000 native speakers of Italian took part. We validated the ICP dataset by (1) showing that ICP reaction times correlate strongly (r =.78) with lexical decision latencies collected in a traditional lab experiment, (2) showing that the effect of major psycholinguistic variables (e.g., frequency, length, etc.) can be replicated in this dataset, and (3) replicating the effect of word prevalence, which we compute here for the first time for Italian. Given the inclusion of many inflectional forms of verbs, adjectives, and nouns, we further showcase the potential of this dataset by exploring two phenomena (inflectional entropy in verb paradigms and the clitic effect in isolated word recognition) that build on the peculiar properties of Italian. In this paper we present the ICP resource and release response times, accuracy, and prevalence estimates for all the words included.

Scheda breve

Scheda completa

Scheda completa (DC)

	Sottotipologia
	
				Articolo in rivista - Articolo scientifico
			
	Parole chiave
	
				Crowdsourcing; Lexical decision; Megastudy; Prevalence; Word recognition;
			
	Lingua del contenuto
	
				English
			
	Data ahead of print o Data prima pubblicazione Online
	
				28-dic-2024
			
	Data di pubblicazione
	
				2025
			
	Rivista
	
				BEHAVIOR RESEARCH METHODS
			
	Numero del volume
	
				57
			
	Fascicolo
	
				January 2025
			
	Article number
	
				26
			
	DOI dell'articolo
	
				https://dx.doi.org/10.3758/s13428-024-02548-4
			
	Fulltext
	
				none
			
	Citazione
	
				Amenta, S., de Varda, A., Mandera, P., Keuleers, E., Brysbaert, M., Marelli, M. (2025). The Italian Crowdsourcing Project: Visual word recognition times for 130,495 Italian words. BEHAVIOR RESEARCH METHODS, 57(January 2025) [10.3758/s13428-024-02548-4].
			
	Appare nelle tipologie:
	
				01 - Articolo su rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/546970

Citazioni

4

4

Social impact