We study the problem of group linkage: linking records that refer to multiple entities in the same group. Applications for group linkage include finding businesses in the same chain, finding social network users from the same organization, and so on. Group linkage faces new challenges compared to traditional entity resolution. First, although different members in the same group can share some similar global values of an attribute, they represent different entities so can also have distinct local values for the same or different attributes, requiring a high tolerance for value diversity. Second, we need to be able to distinguish local values from erroneous values. We present a robust two-stage algorithm: The first stage identifies pivots-maximal sets of records that are very likely to belong to the same group, while being robust to possible erroneous values; the second stage collects strong evidence from the pivots and leverages it for merging more records into the same group, while being tolerant to differences in local values of an attribute. Experimental results show the high effectiveness and efficiency of our algorithm on various real-world data sets.

Li, P., Guo, X., Maurino, A., Srivastava, A. (2015). Robust group linkage. In Proceedings of the 24th International Conference on World Wide Web (pp.647-657). Association for Computing Machinery, Inc [10.1145/2736277.2741118].

Robust group linkage

MAURINO, ANDREA
Penultimo
;
2015

Abstract

We study the problem of group linkage: linking records that refer to multiple entities in the same group. Applications for group linkage include finding businesses in the same chain, finding social network users from the same organization, and so on. Group linkage faces new challenges compared to traditional entity resolution. First, although different members in the same group can share some similar global values of an attribute, they represent different entities so can also have distinct local values for the same or different attributes, requiring a high tolerance for value diversity. Second, we need to be able to distinguish local values from erroneous values. We present a robust two-stage algorithm: The first stage identifies pivots-maximal sets of records that are very likely to belong to the same group, while being robust to possible erroneous values; the second stage collects strong evidence from the pivots and leverages it for merging more records into the same group, while being tolerant to differences in local values of an attribute. Experimental results show the high effectiveness and efficiency of our algorithm on various real-world data sets.
paper
group linkage, k-robustness, pivot
English
International Conference on World Wide Web - 18/22 May
2015
Proceedings of the 24th International Conference on World Wide Web
9781450334693
2015
647
657
http://dl.acm.org/citation.cfm?id=2736277.2741118
reserved
Li, P., Guo, X., Maurino, A., Srivastava, A. (2015). Robust group linkage. In Proceedings of the 24th International Conference on World Wide Web (pp.647-657). Association for Computing Machinery, Inc [10.1145/2736277.2741118].
File in questo prodotto:
File Dimensione Formato  
4-2736277.2741118.pdf

Solo gestori archivio

Tipologia di allegato: Publisher’s Version (Version of Record, VoR)
Dimensione 1.29 MB
Formato Adobe PDF
1.29 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/96843
Citazioni
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 0
Social impact