Construction and Maintenance of Domain Specific Knowledge Graphs for Web Data Integration

Porrini, R

A Knowledge Graph (KG) is a semantically organized, machine readable collection of types, entities, and relations holding between them. A KG helps in mitigating semantic heterogeneity in scenarios that require the integration of data from independent sources into a so called dataspace, realized through the establishment of mappings between the sources and the KG. Applications built on top of a dataspace provide advanced data access features to end-users based on the representation provided by the KG, obtained through the enrichment of the KG with domain specific facets. A facet is a specialized type of relation that models a salient characteristic of entities of particular domains (e.g., the vintage of wines) from an end-user perspective. In order to enrich a KG with a salient and meaningful representation of data, domain experts in charge of maintaining the dataspace must be in possess of extensive knowledge about disparate domains (e.g., from wines to football players). From an end-user perspective, the difficulties in the definition of domain specific facets for dataspaces significantly reduce the user-experience of data access features and thus the ability to fulfill the information needs of end-users. Remarkably, this problem has not been adequately studied in the literature, which mostly focuses on the enrichment of the KG with a generalist, coverage oriented, and not domain specific representation of data occurring in the dataspace. Motivated by this challenge, this dissertation introduces automatic techniques to support domain experts in the enrichment of a KG with facets that provide a domain specific representation of data. Since facets are a specialized type of relations, the techniques proposed in this dissertation aim at extracting salient domain specific relations. The fundamental components of a dataspace, namely the KG and the mappings between sources and KG elements, are leveraged to elicitate such domain specific representation from specialized data sources of the dataspace, and to support domain experts with valuable information for the supervision of the process. Facets are extracted by leveraging already established mappings between specialized sources and the KG. After extraction, a domain specific interpretation of facets is provided by re-using relations already defined in the KG, to ensure tight integration of data. This dissertation introduces also a framework to profile the status of the KG, to support the supervision of domain experts in the above tasks. Altogether, the contributions presented in this dissertation provide a set of automatic techniques to support domain experts in the evolution of the KG of a dataspace towards a domain specific, end-user oriented representation. Such techniques analyze and exploit the fundamental components of a dataspace (KG, mappings, and source data) with an effectiveness not achievable with state-of-the-art approaches, as shown by extensive evaluations conducted in both synthetic and real world scenarios.

(2016). Construction and Maintenance of Domain Specific Knowledge Graphs for Web Data Integration. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2016).