Using Twitter as an effective marketing tool has become a gold mine for companies interested in their online reputation. A quite significant research challenge related to the above issue is to disambiguate tweets with respect to company names. In fact, finding if a particular tweet is relevant or irrelevant to a company is an important task not satisfactorily solved yet; to address this issue in this paper we propose a Wikipedia-based two-step filtering algorithm. As opposed to most other methods, the proposed approach is fully automatic and does not rely on hand-coded rules. The first step is a precision-oriented pass that uses Wikipedia as an external knowledge source to extract pertinent terms and phrases from certain parts of company Wikipedia pages, and use these as weighted filters to identify tweets about a given company. The second pass expands the first to increase recall by including more terms from URLs in tweets, Twitter user profile information and hashtags. The approach is evaluated on a CLEF lab dataset, showing good performance - especially for English tweets.
Qureshi, M., Younus, A., O’Riordan, C., Pasi, G. (2015). Company name disambiguation in tweets: A two-step filtering approach. In Information Retrieval Technology (pp.358-365). GEWERBESTRASSE 11, CHAM, CH-6330, SWITZERLAND : Springer Verlag [10.1007/978-3-319-28940-3_28].
Company name disambiguation in tweets: A two-step filtering approach
Qureshi, MA
;Younus, A;Pasi, G
2015
Abstract
Using Twitter as an effective marketing tool has become a gold mine for companies interested in their online reputation. A quite significant research challenge related to the above issue is to disambiguate tweets with respect to company names. In fact, finding if a particular tweet is relevant or irrelevant to a company is an important task not satisfactorily solved yet; to address this issue in this paper we propose a Wikipedia-based two-step filtering algorithm. As opposed to most other methods, the proposed approach is fully automatic and does not rely on hand-coded rules. The first step is a precision-oriented pass that uses Wikipedia as an external knowledge source to extract pertinent terms and phrases from certain parts of company Wikipedia pages, and use these as weighted filters to identify tweets about a given company. The second pass expands the first to increase recall by including more terms from URLs in tweets, Twitter user profile information and hashtags. The approach is evaluated on a CLEF lab dataset, showing good performance - especially for English tweets.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.