It is well known that the surprisal of an upcoming word, as estimated by language models, isa solid predictor of reading times (Smith andLevy, 2013). However, most of the studiesthat support this view are based on English andfew other Germanic languages, leaving an openquestion as to the cross-lingual generalizability of such findings. Moreover, they tend toconsider only the best-performing eye-trackingmeasure, which might conflate the effects ofpredictive and integrative processing. Furthermore, it is not clear whether prediction plays arole in non-native language processing in bilingual individuals (Grüter et al., 2014). We approach these problems at large scale, extracting surprisal estimates from mBERT, and assessing their psychometric predictive power onthe MECO corpus, a cross-linguistic dataset ofeye movement behavior in reading (Siegelmanet al., 2022; Kuperman et al., 2020). We showthat surprisal is a strong predictor of readingtimes across languages and fixation measurements, and that its effects in L2 are weaker withrespect to L1.

De Varda, A., Marelli, M. (2022). The Effects of Surprisal across Languages: Results from Native and Non-native Reading. In 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing - Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022 (pp.138-141). Association for Computational Linguistics (ACL).

The Effects of Surprisal across Languages: Results from Native and Non-native Reading

De Varda, A;Marelli, M
2022

Abstract

It is well known that the surprisal of an upcoming word, as estimated by language models, isa solid predictor of reading times (Smith andLevy, 2013). However, most of the studiesthat support this view are based on English andfew other Germanic languages, leaving an openquestion as to the cross-lingual generalizability of such findings. Moreover, they tend toconsider only the best-performing eye-trackingmeasure, which might conflate the effects ofpredictive and integrative processing. Furthermore, it is not clear whether prediction plays arole in non-native language processing in bilingual individuals (Grüter et al., 2014). We approach these problems at large scale, extracting surprisal estimates from mBERT, and assessing their psychometric predictive power onthe MECO corpus, a cross-linguistic dataset ofeye movement behavior in reading (Siegelmanet al., 2022; Kuperman et al., 2020). We showthat surprisal is a strong predictor of readingtimes across languages and fixation measurements, and that its effects in L2 are weaker withrespect to L1.
paper
suprisal, sentence processing, multilanguage models
English
2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, AACL-IJCNLP 2022 - 20 November 2022 through 23 November 2022
2022
2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing - Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022
9781959429043
2022
138
141
none
De Varda, A., Marelli, M. (2022). The Effects of Surprisal across Languages: Results from Native and Non-native Reading. In 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing - Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022 (pp.138-141). Association for Computational Linguistics (ACL).
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/467168
Citazioni
  • Scopus 12
  • ???jsp.display-item.citation.isi??? ND
Social impact