Social media repositories serve as a significant source of evidence when extracting information related to the reputation of a particular entity (e.g., a particular politician, singer or company). Reputation management experts are in need of automated methods for mining the social media repositories (in particular Twitter) to monitor the reputation of a particular entity. A quite significant research challenge related to the above issue is to disambiguate tweets with respect to entity names. To address this issue in this paper we use "context phrases" in a tweet and Wikipedia disambiguated articles for a particular entity in a random forest classifier. Furthermore, we also utilize the concept of "relatedness" between tweet and entity using the Wikipedia category-article structure that captures the amount of discussion present inside a tweet related to an entity. The experimental evaluations show a significant improvement over the baseline and comparable performance with other systems representing strong performance given that we restrict ourselves to features extracted from Wikipedia. © Springer International Publishing Switzerland 2014.
Qureshi, M., O'Riordan, C., Pasi, G. (2014). Exploiting wikipedia for entity name disambiguation in tweets. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp.184-195). Springer Verlag [10.1007/978-3-319-07983-7_25].
Exploiting wikipedia for entity name disambiguation in tweets
QURESHI, MUHAMMAD ATIFPrimo
;PASI, GABRIELLAUltimo
2014
Abstract
Social media repositories serve as a significant source of evidence when extracting information related to the reputation of a particular entity (e.g., a particular politician, singer or company). Reputation management experts are in need of automated methods for mining the social media repositories (in particular Twitter) to monitor the reputation of a particular entity. A quite significant research challenge related to the above issue is to disambiguate tweets with respect to entity names. To address this issue in this paper we use "context phrases" in a tweet and Wikipedia disambiguated articles for a particular entity in a random forest classifier. Furthermore, we also utilize the concept of "relatedness" between tweet and entity using the Wikipedia category-article structure that captures the amount of discussion present inside a tweet related to an entity. The experimental evaluations show a significant improvement over the baseline and comparable performance with other systems representing strong performance given that we restrict ourselves to features extracted from Wikipedia. © Springer International Publishing Switzerland 2014.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.