In the Web, multilingual data are growing fast and exist in a large number of sources. \emph{Ontologies} have been proposed for the ease of data exchange and integration across applications. When data sources using different ontologies have to be integrated, mappings between the concepts described in these ontologies have to be established. \emph{Cross-lingual ontology mapping} is the task of establishing mappings between concepts lexicalized in different languages. Cross-lingual ontology mapping is currently considered an important challenge, which plays a fundamental role in establishing semantic relations between concepts lexicalized in different languages, in order to align two language-based resources; to create multilingual lexical resources with rich lexicalizations; or to support a bilingual data annotation. Most of the cross-lingual mapping methods include a step in which the concepts' lexicalizations are automatically translated into different languages. One of the most frequently adopted approaches in the state-of-the-art to obtain automatic translations includes the use of \textit{multilingual lexical resources}, such as machine translation tools, which have been recognized as the largest available resources for translations. However, translation quality achieved by machine translation is limited and affected by noise; one reason of this quality is due to the polysemous and synonymous nature of natural languages. The quality of the translations used by a mapping method has a major impact on its performance. The main goal of this thesis is to provide an automatic cross-lingual mapping method that leverages lexical evidence obtained from automatic translations, in order to automatically support the decision in mapping concepts across different languages, or even to support semi-automatic mapping workflows. In particular, in establishing mappings between very large, lexically-rich resources, e.g., lexical ontologies. The major contributions of this thesis can be summarized as follows: I presents a classification-based interpretation for cross-lingual mappings; I analyze at a large-scale the effectiveness of automatic translations on cross-lingual mapping tasks; I classifies concepts in lexical ontologies based on different lexical characteristics; I proposes an automatic cross-lingual lexical mapping method based on a novel translation-based similarity measure and a local similarity optimization algorithm; finally, I implements a Web tool that supports a semi-automatic mapping approach based on the proposed method.
Il Web offre una quantità sempre più grande di dati multilingua disponibili in un numero elevato di sorgenti informative. Le ontologie sono state proposte per facilitare lo scambio e l'integrazione di dati tra più applicazioni diverse. Al fine di integrare sorgenti informative che utilizzano ontologie diverse è necessario stabilire delle corrispondenze (i.e., mappings) tra concetti ontologici specificati in tali ontologie. Il processo di generazione di tali corrispondenze tra concetti lessicalizzati in lingue diverse prende il nome di cross-lingual ontology mapping. Il cross-lingual ontology mapping ed è considerato attualmente una sfida difficile e gioca un ruolo fondamentale nello stabilire relazioni semantiche tra concetti lessicalizzati in lingue differenti al fine, ad esempio, di: allineare due risorse specifiche per linguaggi diversi, creare risorse multi-lingua che possiedano ricche lessicalizzazioni, o supportare l'annotazione di dati bi-lingue. Molti delle tecniche di cross-lingual ontology mapping includono un passo di traduzione automatica in linguaggi diverse delle lessicalizzazioni dei concetti. Uno degli approcci più frequentemente adottati nello stato dell'arte per l'ottenimento di traduzioni automatiche include l'utilizzo di risorse lessicali multi-lingua come ed esempio strumenti di machine translation i quali sono riconosciuti come le fonti più complete attualmente disponibili. Tuttavia, la qualità delle traduzioni ottenute da strumenti di machine translation è limitata ed affetta da rumore; una ragione di questo fenomeno è la natura polisemica e sinonimica del linguaggio naturale. La qualità delle traduzioni utilizzate da un metodo di mapping ne impatta drasticamente l'efficacia. L'obiettivo principale di questa tesi è quello di proporre un metodo automatico per il cross-lingual mapping the sfrutti evidenza lessicale ottenuta da traduzioni automatiche al fine di supportare automaticamente il mapping di concetti in lingue diverse, oppure processi semi-automatici di mapping. In particolare, stabilire mapping tra risorse lessicalmente ricche e molto grandi, come ad esempio le ontologie lessicali. I maggiori contributi di questa tesi possono essere riassunti come segue: propongo una interpretazione classification-based dei mapping cross-lingua; analizzo su larga scala l'efficacia delle traduzioni automatiche applicate in processi di cross-lingual mapping; propongo una classificazione dei concetti di ontologie lessicali basata su un insieme di caratteristiche lessicali differenti; propongo un metodo automatico di cross-lingual mapping che utilizza una nuova misura di similarità basata sulle traduzioni ed un algoritmo di ottimizzazione della similarità locale; infine, un'applicazione Web che supporta il mapping semi-automatico basato sul metodo proposto
(2016). Cross-Lingual Mapping of Lexical Ontologies with Automatic Translation. (Tesi di dottorato, Università degli Studi di Milano-Bicocca, 2016).
Cross-Lingual Mapping of Lexical Ontologies with Automatic Translation
ABU HELOU, MAMOUN
2016
Abstract
In the Web, multilingual data are growing fast and exist in a large number of sources. \emph{Ontologies} have been proposed for the ease of data exchange and integration across applications. When data sources using different ontologies have to be integrated, mappings between the concepts described in these ontologies have to be established. \emph{Cross-lingual ontology mapping} is the task of establishing mappings between concepts lexicalized in different languages. Cross-lingual ontology mapping is currently considered an important challenge, which plays a fundamental role in establishing semantic relations between concepts lexicalized in different languages, in order to align two language-based resources; to create multilingual lexical resources with rich lexicalizations; or to support a bilingual data annotation. Most of the cross-lingual mapping methods include a step in which the concepts' lexicalizations are automatically translated into different languages. One of the most frequently adopted approaches in the state-of-the-art to obtain automatic translations includes the use of \textit{multilingual lexical resources}, such as machine translation tools, which have been recognized as the largest available resources for translations. However, translation quality achieved by machine translation is limited and affected by noise; one reason of this quality is due to the polysemous and synonymous nature of natural languages. The quality of the translations used by a mapping method has a major impact on its performance. The main goal of this thesis is to provide an automatic cross-lingual mapping method that leverages lexical evidence obtained from automatic translations, in order to automatically support the decision in mapping concepts across different languages, or even to support semi-automatic mapping workflows. In particular, in establishing mappings between very large, lexically-rich resources, e.g., lexical ontologies. The major contributions of this thesis can be summarized as follows: I presents a classification-based interpretation for cross-lingual mappings; I analyze at a large-scale the effectiveness of automatic translations on cross-lingual mapping tasks; I classifies concepts in lexical ontologies based on different lexical characteristics; I proposes an automatic cross-lingual lexical mapping method based on a novel translation-based similarity measure and a local similarity optimization algorithm; finally, I implements a Web tool that supports a semi-automatic mapping approach based on the proposed method.File | Dimensione | Formato | |
---|---|---|---|
phd_unimib_775219.pdf
Accesso Aperto
Descrizione: Tesi dottorato
Tipologia di allegato:
Doctoral thesis
Dimensione
5.57 MB
Formato
Adobe PDF
|
5.57 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.