Orthography–semantics consistency (OSC) is a measure that quantifies the degree of semantic relatedness between a word and its orthographic relatives. OSC is computed as the frequency-weighted average semantic similarity between the meaning of a given word and the meanings of all the words containing that very same orthographic string, as captured by distributional semantic models. We present a resource including optimized estimates of OSC for 15,017 English words. In a series of analyses, we provide a progressive optimization of the OSC variable. We show that computing OSC from word-embeddings models (in place of traditional count models), limiting preprocessing of the corpus used for inducing semantic vectors (in particular, avoiding part-of-speech tagging and lemmatization), and relying on a wider pool of orthographic relatives provide better performance for the measure in a lexical-processing task. We further show that OSC is an important and significant predictor of reaction times in visual word recognition and word naming, one that correlates only weakly with other psycholinguistic variables (e.g., family size, word frequency), indicating that it captures a novel source of variance in lexical access. Finally, some theoretical and methodological implications are discussed of adopting OSC as one of the predictors of reaction times in studies of visual word recognition.

Marelli, M., Amenta, S. (2018). A database of orthography-semantics consistency (OSC) estimates for 15,017 English words. BEHAVIOR RESEARCH METHODS, 50(4), 1482-1495 [10.3758/s13428-018-1017-8].

A database of orthography-semantics consistency (OSC) estimates for 15,017 English words

Marelli, M
;
Amenta, S
2018

Abstract

Orthography–semantics consistency (OSC) is a measure that quantifies the degree of semantic relatedness between a word and its orthographic relatives. OSC is computed as the frequency-weighted average semantic similarity between the meaning of a given word and the meanings of all the words containing that very same orthographic string, as captured by distributional semantic models. We present a resource including optimized estimates of OSC for 15,017 English words. In a series of analyses, we provide a progressive optimization of the OSC variable. We show that computing OSC from word-embeddings models (in place of traditional count models), limiting preprocessing of the corpus used for inducing semantic vectors (in particular, avoiding part-of-speech tagging and lemmatization), and relying on a wider pool of orthographic relatives provide better performance for the measure in a lexical-processing task. We further show that OSC is an important and significant predictor of reaction times in visual word recognition and word naming, one that correlates only weakly with other psycholinguistic variables (e.g., family size, word frequency), indicating that it captures a novel source of variance in lexical access. Finally, some theoretical and methodological implications are discussed of adopting OSC as one of the predictors of reaction times in studies of visual word recognition.
Articolo in rivista - Articolo scientifico
Distributional semantic models; Form–meaning mapping; Lexical resources; Orthography–semantics consistency; Word recognition;
Distributional semantic models; Form-meaning mapping; Lexical resources; Orthography-semantics consistency; Word recognition
English
2018
50
4
1482
1495
open
Marelli, M., Amenta, S. (2018). A database of orthography-semantics consistency (OSC) estimates for 15,017 English words. BEHAVIOR RESEARCH METHODS, 50(4), 1482-1495 [10.3758/s13428-018-1017-8].
File in questo prodotto:
File Dimensione Formato  
10281-182592.pdf

accesso aperto

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Dimensione 717.16 kB
Formato Adobe PDF
717.16 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/182592
Citazioni
  • Scopus 34
  • ???jsp.display-item.citation.isi??? 32
Social impact