Categorization of instances in dataspaces is a difficult and time consuming task, usually performed by domain experts. In this paper we propose a semi-automatic approach to the extraction of facets for the fine-grained categorization of instances in dataspaces. We focus on the case where instances are categorized under heterogeneous taxonomies in several sources. Our approach leverages Taxonomy Layer Distance, a new metric based on structural analysis of source taxonomies, to support the identification of meaningful candidate facets. Once validated and refined by domain experts, the extracted facets provide a fine-grained classification of dataspace instances. We implemented and evaluated our approach in a real world dataspace in the eCommerce domain. Experimental results show that our approach is capable of extracting meaningful facets and that the new metric we propose for the structural analysis of source taxonomies outperforms other state-of-the-art metrics. © 2014 Springer International Publishing.

Porrini, R., Palmonari, M., Batini, C. (2014). Extracting facets from lost fine-grained categorizations in dataspaces. In 26th International Conference on Advanced Information Systems Engineering, CAiSE 2014 (pp.580-594). Springer Verlag [10.1007/978-3-319-07881-6_39].

Extracting facets from lost fine-grained categorizations in dataspaces

PORRINI, RICCARDO
Primo
;
PALMONARI, MATTEO LUIGI
Secondo
;
BATINI, CARLO
Ultimo
2014

Abstract

Categorization of instances in dataspaces is a difficult and time consuming task, usually performed by domain experts. In this paper we propose a semi-automatic approach to the extraction of facets for the fine-grained categorization of instances in dataspaces. We focus on the case where instances are categorized under heterogeneous taxonomies in several sources. Our approach leverages Taxonomy Layer Distance, a new metric based on structural analysis of source taxonomies, to support the identification of meaningful candidate facets. Once validated and refined by domain experts, the extracted facets provide a fine-grained classification of dataspace instances. We implemented and evaluated our approach in a real world dataspace in the eCommerce domain. Experimental results show that our approach is capable of extracting meaningful facets and that the new metric we propose for the structural analysis of source taxonomies outperforms other state-of-the-art metrics. © 2014 Springer International Publishing.
paper
dataspaces; facet extraction; taxonomy integration; web data integration;
English
International Conference on Advanced Information Systems Engineering, CAiSE 2014 - 16-20 June
2014
26th International Conference on Advanced Information Systems Engineering, CAiSE 2014
978-331907880-9
2014
8484
580
594
none
Porrini, R., Palmonari, M., Batini, C. (2014). Extracting facets from lost fine-grained categorizations in dataspaces. In 26th International Conference on Advanced Information Systems Engineering, CAiSE 2014 (pp.580-594). Springer Verlag [10.1007/978-3-319-07881-6_39].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/58634
Citazioni
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
Social impact