Hierarchical taxonomies serve as fundamental structures for reasoning with hierarchical concepts across various domains such as healthcare, finance, and economy. However, maintaining their relevance and accuracy is a labor-intensive and error-prone task, demanding experts to identify and revise novel concepts constantly. In this context, distributional semantics techniques offer a promising avenue by suggesting terms likely to be associated with existing concepts. In our study, we propose a method to enhance taxonomies by adding related terms using contextual word embedding as encoders. We introduce VESPATE (VEctor SPAce model for Taxonomy Enrichment), a system designed to automatically expand any given hierarchical taxonomy with new terms using three generative models. Additionally, we integrate VESPATE with human validation to identify and select the most suitable terms for inclusion in the taxonomy. VESPATE was deployed within an EU project to enrich the official European Skill taxonomy, ESCO, with 40K+ digital terms gathered from the Web, aligning ESCO skills with current labor market needs. A total of 924 terms were selected through VESPATE, with 757 new terms subsequently validated by domain experts as correctly matched. Our framework, employing a pool of LLMs as encoders, helped us mitigate the limitations of the generative model, reducing the potential for errors and ensuring precise results in taxonomy enrichment. Additionally, the implementation of VESPATE consistently decreased the human effort required for the project. We evaluated the robustness of our system against a baseline constructed using ESCO's hierarchy, achieving a 81% Positive Predictive Value (PPV) when combining all three models.
D'Amico, S., De Santo, A., Mercorio, F., Mezzanzanica, M. (2024). Enriching Skill Taxonomies through Vector Space Models. In 2024 IEEE International Conference on Big Data (BigData) (pp.2297-2302). Institute of Electrical and Electronics Engineers Inc. [10.1109/bigdata62323.2024.10825415].
Enriching Skill Taxonomies through Vector Space Models
D'Amico, Simone;De Santo, Alessia;Mercorio, Fabio;Mezzanzanica, Mario
2024
Abstract
Hierarchical taxonomies serve as fundamental structures for reasoning with hierarchical concepts across various domains such as healthcare, finance, and economy. However, maintaining their relevance and accuracy is a labor-intensive and error-prone task, demanding experts to identify and revise novel concepts constantly. In this context, distributional semantics techniques offer a promising avenue by suggesting terms likely to be associated with existing concepts. In our study, we propose a method to enhance taxonomies by adding related terms using contextual word embedding as encoders. We introduce VESPATE (VEctor SPAce model for Taxonomy Enrichment), a system designed to automatically expand any given hierarchical taxonomy with new terms using three generative models. Additionally, we integrate VESPATE with human validation to identify and select the most suitable terms for inclusion in the taxonomy. VESPATE was deployed within an EU project to enrich the official European Skill taxonomy, ESCO, with 40K+ digital terms gathered from the Web, aligning ESCO skills with current labor market needs. A total of 924 terms were selected through VESPATE, with 757 new terms subsequently validated by domain experts as correctly matched. Our framework, employing a pool of LLMs as encoders, helped us mitigate the limitations of the generative model, reducing the potential for errors and ensuring precise results in taxonomy enrichment. Additionally, the implementation of VESPATE consistently decreased the human effort required for the project. We evaluated the robustness of our system against a baseline constructed using ESCO's hierarchy, achieving a 81% Positive Predictive Value (PPV) when combining all three models.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.