Social media repositories serve as a significant source of evidence when extracting information related to the reputation of a particular entity (e.g., a particular politician, singer or company). Reputation management experts manually mine the social media repositories (in particular Twitter) for monitoring the reputation of a particular entity. Recently, the online reputation management evaluation campaign known as RepLab at CLEF has turned attention to devising computational methods for facilitating reputation management experts. A quite significant research challenge related to the above issue is to disambiguate tweets with respect to entity names. In fact, finding if a particular tweet is relevant or irrelevant to a particular entity is an important task not satisfactorily solved yet; to address this issue in this paper we use \context phrases" in a tweet and Wikipedia disambiguated articles for a particular entity in an SVM classifier that utilizes features extracted from the Wikipedia graph structure i.e., links into Wikipedia articles and links from Wikipedia articles. Additionally we also use features derived from term-specificity and term-collocation features derived from the Wikipedia article of an entity under investigation. The experimental evaluations do not show a significant improvement over the baseline and other systems outperform our approach; however, manual inspection of feature sets and training data demonstrates the proposed Wikipedia graph-based features may show a promising outcome when used in combination with sophisticated learning algorithms
Qureshi, M., Younus, A., Abril, D., O'Riordan, C., Pasi, G. (2013). CIRG IRGDISCO at RepLab2013 Filtering Task: Use of Wikipedia's Graph Structure for Entity Name Disambiguation in Tweets. In Working Notes for CLEF 2013 Conference , Valencia, Spain, September, 23-26, 2013. CEUR-WS.
CIRG IRGDISCO at RepLab2013 Filtering Task: Use of Wikipedia's Graph Structure for Entity Name Disambiguation in Tweets
QURESHI, MUHAMMAD ATIF;YOUNUS, ARJUMAND;PASI, GABRIELLAUltimo
2013
Abstract
Social media repositories serve as a significant source of evidence when extracting information related to the reputation of a particular entity (e.g., a particular politician, singer or company). Reputation management experts manually mine the social media repositories (in particular Twitter) for monitoring the reputation of a particular entity. Recently, the online reputation management evaluation campaign known as RepLab at CLEF has turned attention to devising computational methods for facilitating reputation management experts. A quite significant research challenge related to the above issue is to disambiguate tweets with respect to entity names. In fact, finding if a particular tweet is relevant or irrelevant to a particular entity is an important task not satisfactorily solved yet; to address this issue in this paper we use \context phrases" in a tweet and Wikipedia disambiguated articles for a particular entity in an SVM classifier that utilizes features extracted from the Wikipedia graph structure i.e., links into Wikipedia articles and links from Wikipedia articles. Additionally we also use features derived from term-specificity and term-collocation features derived from the Wikipedia article of an entity under investigation. The experimental evaluations do not show a significant improvement over the baseline and other systems outperform our approach; however, manual inspection of feature sets and training data demonstrates the proposed Wikipedia graph-based features may show a promising outcome when used in combination with sophisticated learning algorithmsI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.