Several techniques and workflows have emerged recently for automatically extracting knowledge graphs from documents like scientific articles and patents. However, adapting these approaches to integrate alternative text sources such as micro-blogging posts and news and to model open-domain entities and relationships commonly found in these sources is still challenging. This paper introduces an improved information extraction pipeline designed specifically for extracting a knowledge graph comprising open-domain entities from micro-blogging posts on social media platforms. Our pipeline utilizes dependency parsing and employs unsupervised classification of entity relations through hierarchical clustering over word embeddings. We present a case study involving the extraction of semantic triples from a tweet collection concerning digital transformation and show through two experimental evaluations on the same dataset that our system achieves precision rates exceeding 95% and surpasses similar pipelines by approximately 5% in terms of precision, while also generating a notably higher number of triples.

Zavarella, V., Recupero, D., Consoli, S., Fenu, G., Angioni, S., Buscaldi, D., et al. (2024). Knowledge Graphs for Digital Transformation Monitoring in Social Media. In Proceedings of the 3rd International workshop one knowledge graph generation from text (TEXT2KG) and Data Quality meets Machine Learning and Knowledge Graphs (DQMLKG) co-located with the Extended Semantic Web Conference ( ESWC 2024) (pp.1-13). CEUR-WS.

Knowledge Graphs for Digital Transformation Monitoring in Social Media

Osborne F.
2024

Abstract

Several techniques and workflows have emerged recently for automatically extracting knowledge graphs from documents like scientific articles and patents. However, adapting these approaches to integrate alternative text sources such as micro-blogging posts and news and to model open-domain entities and relationships commonly found in these sources is still challenging. This paper introduces an improved information extraction pipeline designed specifically for extracting a knowledge graph comprising open-domain entities from micro-blogging posts on social media platforms. Our pipeline utilizes dependency parsing and employs unsupervised classification of entity relations through hierarchical clustering over word embeddings. We present a case study involving the extraction of semantic triples from a tweet collection concerning digital transformation and show through two experimental evaluations on the same dataset that our system achieves precision rates exceeding 95% and surpasses similar pipelines by approximately 5% in terms of precision, while also generating a notably higher number of triples.
paper
Hierarchical Clustering; Information Extraction; Knowledge Graphs; Named Entity Recognition; Social Media Analysis; Word Embeddings;
English
Joint of the 3rd International Workshop One Knowledge Graph Generation from Text and Data Quality Meets Machine Learning and Knowledge Graphs, TEXT2KG 2024 and DQMLKG 2024 - May 26-30, 2024
2024
Proceedings of the 3rd International workshop one knowledge graph generation from text (TEXT2KG) and Data Quality meets Machine Learning and Knowledge Graphs (DQMLKG) co-located with the Extended Semantic Web Conference ( ESWC 2024)
2024
3747
1
13
https://ceur-ws.org/Vol-3747/
open
Zavarella, V., Recupero, D., Consoli, S., Fenu, G., Angioni, S., Buscaldi, D., et al. (2024). Knowledge Graphs for Digital Transformation Monitoring in Social Media. In Proceedings of the 3rd International workshop one knowledge graph generation from text (TEXT2KG) and Data Quality meets Machine Learning and Knowledge Graphs (DQMLKG) co-located with the Extended Semantic Web Conference ( ESWC 2024) (pp.1-13). CEUR-WS.
File in questo prodotto:
File Dimensione Formato  
Zavarella-2024-TEXT2KG-VoR.pdf

accesso aperto

Descrizione: This volume and its papers are published under the Creative Commons License Attribution 4.0 International (CC BY 4.0).
Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Licenza: Creative Commons
Dimensione 1.45 MB
Formato Adobe PDF
1.45 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/521193
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
Social impact