Hierarchical taxonomies are crucial for organizing concepts across domains like healthcare, finance, and economics. However, maintaining their accuracy requires continuous updates, often demanding expert input. To address this, we propose TAXMAP (TAxonomy eXpansion through Collaborative LLM MAPping), a system that autonomously expands hierarchical taxonomies using contextual word embeddings and three generative models, with a human validation step ensuring relevance. Implemented within an EU initiative to enhance the European Skill taxonomy (ESCO), our framework processed over 40,000 digital terms gathered from the Web, aligning ESCO skills with labor market needs. This effort resulted in 924 proposed terms, 757 of which were validated by experts as correct. By leveraging large language models (LLMs) as encoders, TAXMAP reduces errors, ensures high precision, and minimizes human effort. Compared to a baseline based on ESCO's hierarchy, our system achieved an 81% Positive Predictive Value (PPV) when combining all three models.
D'Amico, S., De Santo, A., Mezzanzanica, M., Mercorio, F. (2025). Taxonomy Expansion through Collaborative LLM Mapping. In SAC '25: Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing (pp.1961-1968) [10.1145/3672608.3707906].
Taxonomy Expansion through Collaborative LLM Mapping
D'Amico, Simone;De Santo, Alessia;Mezzanzanica, Mario;Mercorio, Fabio
2025
Abstract
Hierarchical taxonomies are crucial for organizing concepts across domains like healthcare, finance, and economics. However, maintaining their accuracy requires continuous updates, often demanding expert input. To address this, we propose TAXMAP (TAxonomy eXpansion through Collaborative LLM MAPping), a system that autonomously expands hierarchical taxonomies using contextual word embeddings and three generative models, with a human validation step ensuring relevance. Implemented within an EU initiative to enhance the European Skill taxonomy (ESCO), our framework processed over 40,000 digital terms gathered from the Web, aligning ESCO skills with labor market needs. This effort resulted in 924 proposed terms, 757 of which were validated by experts as correct. By leveraging large language models (LLMs) as encoders, TAXMAP reduces errors, ensures high precision, and minimizes human effort. Compared to a baseline based on ESCO's hierarchy, our system achieved an 81% Positive Predictive Value (PPV) when combining all three models.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.