We present ITALIC, a large-scale benchmark dataset of 10,000 multiple-choice questions designed to evaluate the natural language understanding of the Italian language and culture. ITALIC spans 12 domains, exploiting public tests to score domain experts in real-world scenarios. We detail our data collection process, stratification techniques, and selection strategies. ITALIC provides a comprehensive assessment suite that captures commonsense reasoning and linguistic proficiency in a morphologically rich language. We establish baseline performances using 17 state-of-the-art LLMs, revealing current limitations in Italian language understanding and highlighting significant linguistic complexity and cultural specificity challenges. ITALIC serves as a benchmark for evaluating existing models and as a roadmap for future research, encouraging the development of more sophisticated and culturally aware natural language systems.

Seveso, A., Potertì, D., Federici, E., Mezzanzanica, M., Mercorio, F. (2025). ITALIC: An Italian Culture-Aware Natural Language Benchmark. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), April 29 - May 4, 2025 (pp.1469-1478).

ITALIC: An Italian Culture-Aware Natural Language Benchmark

Seveso, A;Potertì, D;Mezzanzanica, M;Mercorio, F
2025

Abstract

We present ITALIC, a large-scale benchmark dataset of 10,000 multiple-choice questions designed to evaluate the natural language understanding of the Italian language and culture. ITALIC spans 12 domains, exploiting public tests to score domain experts in real-world scenarios. We detail our data collection process, stratification techniques, and selection strategies. ITALIC provides a comprehensive assessment suite that captures commonsense reasoning and linguistic proficiency in a morphologically rich language. We establish baseline performances using 17 state-of-the-art LLMs, revealing current limitations in Italian language understanding and highlighting significant linguistic complexity and cultural specificity challenges. ITALIC serves as a benchmark for evaluating existing models and as a roadmap for future research, encouraging the development of more sophisticated and culturally aware natural language systems.
paper
machine learning; AI; LLMs; generativeAI; NLP
English
2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies - April 29 - May 4, 2025
2025
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), April 29 - May 4, 2025
9798891761896
2025
1
1469
1478
https://aclanthology.org/2025.naacl-long.68/
open
Seveso, A., Potertì, D., Federici, E., Mezzanzanica, M., Mercorio, F. (2025). ITALIC: An Italian Culture-Aware Natural Language Benchmark. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), April 29 - May 4, 2025 (pp.1469-1478).
File in questo prodotto:
File Dimensione Formato  
Seveso-2025-naacl-long.68-VoR.pdf

accesso aperto

Descrizione: Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License.
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 226.45 kB
Formato Adobe PDF
226.45 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/550684
Citazioni
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
Social impact