The paper describes the creation of a manually validated dataset of Italian multiword expressions, building on candidates automatically extracted from corpora of written Italian. The main features of the resource, such as POS-pattern and lemma distribution, are also discussed, together with possible applications.

Masini, F., Micheli, M., Zaninello, A., Castagnoli, S., Nissim, M. (2020). Multiword Expressions We Live by: A Validated Usage-based Dataset from Corpora of Written Italian. In Proceedings of the Seventh Italian Conference on Computational Linguistics.

Multiword Expressions We Live by: A Validated Usage-based Dataset from Corpora of Written Italian

Micheli, M. S.
;
2020

Abstract

The paper describes the creation of a manually validated dataset of Italian multiword expressions, building on candidates automatically extracted from corpora of written Italian. The main features of the resource, such as POS-pattern and lemma distribution, are also discussed, together with possible applications.
paper
Multiword expressions; Italian; corpora;
English
Italian Conference on Computational Linguistics 2020
2020
Monti, J.; Dell'Orletta, F.; Tamburini, F.
Proceedings of the Seventh Italian Conference on Computational Linguistics
2020
2769
http://ceur-ws.org/Vol-2769/paper_33.pdf
none
Masini, F., Micheli, M., Zaninello, A., Castagnoli, S., Nissim, M. (2020). Multiword Expressions We Live by: A Validated Usage-based Dataset from Corpora of Written Italian. In Proceedings of the Seventh Italian Conference on Computational Linguistics.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/300508
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
Social impact