The paper describes the creation of a manually validated dataset of Italian multiword expressions, building on candidates automatically extracted from corpora of written Italian. The main features of the resource, such as POS-pattern and lemma distribution, are also discussed, together with possible applications.
Masini, F., Micheli, M., Zaninello, A., Castagnoli, S., Nissim, M. (2020). Multiword Expressions We Live by: A Validated Usage-based Dataset from Corpora of Written Italian. In Proceedings of the Seventh Italian Conference on Computational Linguistics.
Multiword Expressions We Live by: A Validated Usage-based Dataset from Corpora of Written Italian
Micheli, M. S.
;
2020
Abstract
The paper describes the creation of a manually validated dataset of Italian multiword expressions, building on candidates automatically extracted from corpora of written Italian. The main features of the resource, such as POS-pattern and lemma distribution, are also discussed, together with possible applications.File in questo prodotto:
Non ci sono file associati a questo prodotto.
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.