EST

Call for paper
April Edition 2017

International Journal of Applied Information Systems solicits high quality original research papers for the
March 15, 2017
April 2017 Edition of the journal.
The last date of research paper submission is
March 15, 2017
SUBMIT YOUR PAPER

Number 2

Plagiarism Detection using Sequential Pattern Mining

journal image
 Download
1548
  • Ali El-matarawy and Mohammad El-ramly and Reem Bahgat 2013. Plagiarism Detection using Sequential Pattern Mining. International Journal of Applied Information Systems. 5, 2 (January 2013), 24-29. DOI=http://dx.doi.org/10.5120/ijais450846
  • @article{10.5120/ijais2017451568,
    author = {Ali El-matarawy and Mohammad El-ramly and Reem Bahgat},
    title = {Plagiarism Detection using Sequential Pattern Mining},
    journal = {International Journal of Applied Information Systems},
    issue_date = {January 2013},
    volume = {5},
    number = {},
    month = {January},
    year = {2013},
    issn = {},
    pages = {24-29},
    numpages = {},
    url = {/archives/volume5/number2/416-0846},
    doi = { 10.5120/ijais12-450846},
    publisher = { xA9 2012 by IJAIS Journal},
    address = {}
    }
    
  • %1 450846
    %A Ali El-matarawy
    %A Mohammad El-ramly
    %A Reem Bahgat
    %T Plagiarism Detection using Sequential Pattern Mining
    %J International Journal of Applied Information Systems
    %@ 
    %V 5
    %N 
    %P 24-29
    %D 2013
    %I  xA9 2012 by IJAIS Journal
    

Abstract

This research presents a new technique for plagiarism detection using sequential pattern mining titled EgyCD. Over the last decade many techniques and tools for software clone detection have been proposed such as textual approaches, lexical approaches, syntactic approaches, semantic approaches …, etc. In this paper, the research explores the potential of data mining techniques in plagiarism detection. In particular, the research proposed a plagiarism technique based on sequential pattern mining (SPM), words/statements are treated as a sequence of transactions processed by the SPM algorithm to find frequent itemsets. The research submits an experiment to discover copy/paste in the text source and it gave good results in a reasonable and acceptable time.

References

  1. D. A. Black, Tracing Web Plagiarism – A guide for teachers, Internal Document, Department of Communication, Seton Hall University, Version 0. 3, Fall 1999.
  2. P. Clough ,Plagiarism in natural and programming languages: an overview of current tools and technologies, July 2000, Department of Computer Science, University of Sheffield
  3. L. R. Jones, Academic Integrity & Academic Dishonesty:A Handbook About Cheating & Plagiarism, Revised & Expanded Edition, Florida Institute of Technology, Melbourne, Florida.
  4. Schleimer, S. , Wilkerson, D. S. , Aiken, A. : Winnowing: local algorithms for document fingerprinting. In: SIGMOD '03: Proceedings of the 2003 ACM SIGMOD international conference on Management of data. pp. 76–85. ACM, New York, NY, USA (2003).
  5. Approaches for Intrinsic and External Plagiarism Detection Notebook for PAN at CLEF 2011, Gabriel Oberreuter, Gaston L'Huillier, Sebastián A. Ríos, and Juan D. Velásquez, Department of Industrial Engineering, University of Chile.
  6. Potthast, M. , Barrón-Cedeño, A. , Eiselt, A. , Stein, B. , Rosso, P. : Overview of the 2nd international competition on plagiarism detection. In: Braschler, M. , Harman, D. (eds. ) Notebook Papers of CLEF 2010 LABs and Workshops, 22-23 September, Padua, Italy (2010).
  7. Potthast, M. , Stein, B. , Eiselt, A. , Barrón-Cedeño, A. , Rosso, P. : Overview of the 1st international competition on plagiarism detection. In: Stein, B. , Rosso, P. , Stamatatos, E. , Koppel, M. , Agirre, E. (eds. ) SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 09). pp. 1–9. CEUR-WS. org (Sep 2009), http://ceur-ws. org/Vol-502.
  8. A. B. Cede˜no, P. Rosso ,On Automatic Plagiarism Detection Based on n-Grams Comparison, Natural Language Engineering Lab. , Dpto. Sistemas Inform´aticos y Computaci´on, Universidad Polit´ecnica de Valencia, Spain.
  9. Lyon, C. , Barrett, R. , Malcolm, J. : A Theoretical Basis to the Automated Detection of Copying Between Texts, and its Practical Implementation in the Ferret Plagiarism and Collusion Detector. In: Plagiarism: Prevention, Practice and Policies Conference, Newcastle, UK (2004).
  10. Kang, N. , Gelbukh, A. : PPChecker: Plagiarism Pattern Checker in Document Copy Detection. In: Sojka, P. , Kope?cek, I. , Pala, K. (eds. ) TSD 2006. LNCS, vol. 4188, pp. 661–667. Springer, Heidelberg (2006).
  11. M. -S. Chen, J. Han, and P. S. Yu. Data mining: an overview from a database perspective. IEEE Trans. On Knowledge And Data Engineering 8, 866-883 (1996).
  12. Q. Zhao, S. S. Bhowmick, Sequential pattern mining: a survey, Technical Report Center for Advanced Information Systems, School of Computer Engineering, Nanyang Technological University, Singapore, (2003).
  13. C. Liu, C. Chen, J. Han and P. Yu, GPLAG: Detection of Software Plagiarism by Program Dependence Graph Analysis, in: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, pp. 872-881 (2006).
  14. Vera Wahler, Dietmar Seipel, J¨urgen Wolff v. Gudenberg, and Gregor Fischer. Clone Detection in Source Code by Frequent Itemset Techniques, Source Code Analysis and Manipulation, 2004. Fourth IEEE International Workshop on16-16 Sept. 2004.
  15. M. Gabel, L. Jiang and Z. Su, Scalable Detection of Semantic Clones, in: Proceedings of the 30th International Conference on Software Engineering, ICSE 2008, pp. 321-330 (2008).
  16. A. Leitlao, Detection of Redundant Code Using R2D2, Software Quality Journal, 12(4):361-382 (2004).

Keywords

Plagiarism Detector, Plagiarized Clones, Textual Approach, Lexical Approach, Syntactic Approach, Data Mining, Apriori Property, Sequential Pattern Mining

Index Terms

Computer Science
Information Sciences