We present a detailed description of an algorithm tailored to detect external plagiarism in PAN-09 competition. The algorithm is divided into three steps: a first reduction of the size of the problem by a selection of ten suspicious plagiarists using a n-gram distance on properly recoded texts. A search for matches after T9-like recoding. A “joining algorithm” that merges selected matches and is able to detect obfuscated plagiarism. The results are briefly discussed. Keywords: n-grams, plagiarism, coding, string matching
Basile, C., Benedetto, D., Caglioti, E., Cristadoro, G., Degli Esposti, M. (2009). A plagiarism detection procedure in three steps: selection, matches and ”squares”. In Proceedings of SEPLN 2009 - 3rd Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, PAN 2009 and 1st International Competition on Plagiarism Detection; San Sebastian (Donostia); Spain; 10 September 2009 (pp.19-23).
A plagiarism detection procedure in three steps: selection, matches and ”squares”
Cristadoro, G;
2009
Abstract
We present a detailed description of an algorithm tailored to detect external plagiarism in PAN-09 competition. The algorithm is divided into three steps: a first reduction of the size of the problem by a selection of ten suspicious plagiarists using a n-gram distance on properly recoded texts. A search for matches after T9-like recoding. A “joining algorithm” that merges selected matches and is able to detect obfuscated plagiarism. The results are briefly discussed. Keywords: n-grams, plagiarism, coding, string matchingFile | Dimensione | Formato | |
---|---|---|---|
AAAplagioPAN_pubblicato.pdf
Solo gestori archivio
Descrizione: post-print
Dimensione
300.41 kB
Formato
Adobe PDF
|
300.41 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.