AUTHOR=Gandhi Nandini , Gopalan Kaushik , Prasad Prajish TITLE=A Support Vector Machine based approach for plagiarism detection in Python code submissions in undergraduate settings JOURNAL=Frontiers in Computer Science VOLUME=6 YEAR=2024 URL=https://www.frontiersin.org/journals/computer-science/articles/10.3389/fcomp.2024.1393723 DOI=10.3389/fcomp.2024.1393723 ISSN=2624-9898 ABSTRACT=

Mechanisms for plagiarism detection play a crucial role in maintaining academic integrity, acting both to penalize wrongdoing while also serving as a preemptive deterrent for bad behavior. This manuscript proposes a customized plagiarism detection algorithm tailored to detect source code plagiarism in the Python programming language. Our approach combines textual and syntactic techniques, employing a support vector machine (SVM) to effectively combine various indicators of similarity and calculate the resulting similarity scores. The algorithm was trained and tested using a sample of code submissions of 4 coding problems each from 45 volunteers; 15 of these were original submissions while the other 30 were plagiarized samples. The submissions of two of the questions was used for training and the other two for testing-using the leave-p-out cross-validation strategy to avoid overfitting. We compare the performance of the proposed method with two widely used tools-MOSS and JPlag—and find that the proposed method results in a small but significant improvement in accuracy compared to JPlag, while significantly outperforming MOSS in flagging plagiarized samples.