We present a detailed description of an algorithm tailored to detect external plagiarism in PAN-09 competition. The algorithm is divided into three steps: a first reduction of the size of the problem by a selection of ten suspicious plagiarists using a n-gram distance on properly recoded texts. A search for matches after T9-like recoding. A “joining algorithm” that merges selected matches and is able to detect obfuscated plagiarism. The results are briefly discussed. Keywords: n-grams, plagiarism, coding, string matching

Basile, C., Benedetto, D., Caglioti, E., Cristadoro, G., Degli Esposti, M. (2009). A plagiarism detection procedure in three steps: Selection, matches and squares. In Proceedings of SEPLN 2009 - 3rd Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, PAN 2009 and 1st International Competition on Plagiarism Detection; San Sebastian (Donostia); Spain; 10 September 2009 (pp.19-23). CEUR-WS.

A plagiarism detection procedure in three steps: Selection, matches and squares

Cristadoro, G;
2009

Abstract

We present a detailed description of an algorithm tailored to detect external plagiarism in PAN-09 competition. The algorithm is divided into three steps: a first reduction of the size of the problem by a selection of ten suspicious plagiarists using a n-gram distance on properly recoded texts. A search for matches after T9-like recoding. A “joining algorithm” that merges selected matches and is able to detect obfuscated plagiarism. The results are briefly discussed. Keywords: n-grams, plagiarism, coding, string matching
paper
Coding; N-grams; Plagiarism; String matching;
n-grams, plagiarism, coding, string matching
English
SEPLN 2009 - 3rd Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, PAN 2009 and 1st International Competition on Plagiarism Detection
2009
Proceedings of SEPLN 2009 - 3rd Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, PAN 2009 and 1st International Competition on Plagiarism Detection; San Sebastian (Donostia); Spain; 10 September 2009
2009
502
19
23
reserved
Basile, C., Benedetto, D., Caglioti, E., Cristadoro, G., Degli Esposti, M. (2009). A plagiarism detection procedure in three steps: Selection, matches and squares. In Proceedings of SEPLN 2009 - 3rd Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, PAN 2009 and 1st International Competition on Plagiarism Detection; San Sebastian (Donostia); Spain; 10 September 2009 (pp.19-23). CEUR-WS.
File in questo prodotto:
File Dimensione Formato  
AAAplagioPAN_pubblicato.pdf

Solo gestori archivio

Descrizione: post-print
Dimensione 300.41 kB
Formato Adobe PDF
300.41 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10281/185451
Citazioni
  • Scopus 43
  • ???jsp.display-item.citation.isi??? ND
Social impact