We present a detailed description of an algorithm tailored to detect external plagiarism in PAN-09 competition. The algorithm is divided into three steps: a first reduction of the size of the problem by a selection of ten suspicious plagiarists using a n-gram distance on properly recoded texts. A search for matches after T9-like recoding. A “joining algorithm” that merges selected matches and is able to detect obfuscated plagiarism. The results are briefly discussed. Keywords: n-grams, plagiarism, coding, string matching

Basile, C., Benedetto, D., Caglioti, E., Cristadoro, G., Degli Esposti, M. (2009). A plagiarism detection procedure in three steps: selection, matches and ”squares”. In Proceedings of SEPLN 2009 - 3rd Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, PAN 2009 and 1st International Competition on Plagiarism Detection; San Sebastian (Donostia); Spain; 10 September 2009 (pp.19-23).

A plagiarism detection procedure in three steps: selection, matches and ”squares”

Cristadoro, G;
2009

Abstract

We present a detailed description of an algorithm tailored to detect external plagiarism in PAN-09 competition. The algorithm is divided into three steps: a first reduction of the size of the problem by a selection of ten suspicious plagiarists using a n-gram distance on properly recoded texts. A search for matches after T9-like recoding. A “joining algorithm” that merges selected matches and is able to detect obfuscated plagiarism. The results are briefly discussed. Keywords: n-grams, plagiarism, coding, string matching
paper
n-grams, plagiarism, coding, string matching
English
3rd PAN Workshop. Uncovering Plagiarism, Authorship And Social Software Misuse with 25th Annual Conference of the Spanish Society for Natural Language Processing, SEPLN 2009
2009
Proceedings of SEPLN 2009 - 3rd Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, PAN 2009 and 1st International Competition on Plagiarism Detection; San Sebastian (Donostia); Spain; 10 September 2009
2009
502
19
23
reserved
Basile, C., Benedetto, D., Caglioti, E., Cristadoro, G., Degli Esposti, M. (2009). A plagiarism detection procedure in three steps: selection, matches and ”squares”. In Proceedings of SEPLN 2009 - 3rd Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, PAN 2009 and 1st International Competition on Plagiarism Detection; San Sebastian (Donostia); Spain; 10 September 2009 (pp.19-23).
File in questo prodotto:
File Dimensione Formato  
AAAplagioPAN_pubblicato.pdf

Solo gestori archivio

Descrizione: post-print
Dimensione 300.41 kB
Formato Adobe PDF
300.41 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/185451
Citazioni
  • Scopus 41
  • ???jsp.display-item.citation.isi??? ND
Social impact