In this paper, we present MammoTab, a dataset composed of 1M Wikipedia tables extracted from over 20M Wikipedia pages and annotated through Wikidata. The lack of this kind of datasets in the state- of-the-art makes MammoTab a good resource for testing and training Semantic Table Interpretation approaches. The dataset has been designed to cover several key challenges, such as disambiguation, homonymy, and NIL-mentions. The dataset has been evaluated using MTab, one of the best approaches of the SemTab challenge.

Marzocchi, M., Cremaschi, M., Pozzi, R., Avogadro, R., Palmonari, M. (2022). MammoTab: a giant and comprehensive dataset for Semantic Table Interpretation. In SemTab 2022 Semantic Web Challenge on Tabular Data to Knowledge Graph Matching Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching co-located with the 21st International Semantic Web Conference (ISWC 2022) (pp.28-33). CEUR-WS.

MammoTab: a giant and comprehensive dataset for Semantic Table Interpretation

Cremaschi, M;Pozzi, R;Avogadro, R;Palmonari, M
2022

Abstract

In this paper, we present MammoTab, a dataset composed of 1M Wikipedia tables extracted from over 20M Wikipedia pages and annotated through Wikidata. The lack of this kind of datasets in the state- of-the-art makes MammoTab a good resource for testing and training Semantic Table Interpretation approaches. The dataset has been designed to cover several key challenges, such as disambiguation, homonymy, and NIL-mentions. The dataset has been evaluated using MTab, one of the best approaches of the SemTab challenge.
paper
Knowledge Graph; Semantic Table Interpretation; SemTab Challenge; Tabular Data;
English
2022 Semantic Web Challenge on Tabular Data to Knowledge Graph Matching, SemTab 2022
2022
Efthymiou, V; Jiménez-Ruiz, E; Chen, J; Cutrona, V; Hassanzadeh, O; Sequeda, J; Srinivas, K; Abdelmageed, N; Hulsebos, M
SemTab 2022 Semantic Web Challenge on Tabular Data to Knowledge Graph Matching Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching co-located with the 21st International Semantic Web Conference (ISWC 2022)
2022
3320
28
33
https://ceur-ws.org/Vol-3320/paper3.pdf
open
Marzocchi, M., Cremaschi, M., Pozzi, R., Avogadro, R., Palmonari, M. (2022). MammoTab: a giant and comprehensive dataset for Semantic Table Interpretation. In SemTab 2022 Semantic Web Challenge on Tabular Data to Knowledge Graph Matching Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching co-located with the 21st International Semantic Web Conference (ISWC 2022) (pp.28-33). CEUR-WS.
File in questo prodotto:
File Dimensione Formato  
Marzocchi-2023-ISWC-VoR.pdf

accesso aperto

Descrizione: Intervento a convegno
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 1.05 MB
Formato Adobe PDF
1.05 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/423134
Citazioni
  • Scopus 3
  • ???jsp.display-item.citation.isi??? ND
Social impact