Pengaruh Normalisasi Teks Dengan Text Expansion Dalam Deteksi Komentar Spam Pada Youtube

Imam Thoib; Arief Setyanto; Suwanto Raharjo

doi:10.29207/resti.v2i3.602

Imam Thoib Universitas Amikom Yogyakarta
Arief Setyanto Universitas Amikom Yogyakarta
Suwanto Raharjo Institut Sains & Teknologi AKPRIND Yogyakarta

DOI: https://doi.org/10.29207/resti.v2i3.602

Keywords: spam detection, text normalization, text expansion, youtube spam comments

Abstract

The popularity of Youtube as the largest video sharing website in the wolrd give spammers opportunities to get benefit from Youtube in illegal ways by putting spam comments on Youtube's videos. Spam comments are very troubling to channel owners. The variants of spam comments are becoming more difficult to detect. One of them is spam comments using abbreviations, symbols, terms or misspelled word to make detection difficult. This research evaluate some classification techniques and employ text normalization method called TextExpansion to deal with this problem. This research uses Youtube Spam Collections dataset from UCI Machine Learning Library composed by five different datasets, which each one contains text comments extracted from YouTube videos (Psy, Katty Perry, LMFAO, Eminem and Shakira). The evaluation results shows TextExpansion is able to produce the highest accuracy value of 90.23%. To determine the impact of applying the TextExpansion method, this research conducted t-test for each dataset. The results of t-test for each dataset shows P(T<=t) two-tail < 0.05 which indicates a significant impact after applying text normalization using TextExpansion.

Downloads

Download data is not yet available.

References

[1] Youtube, “Press - Youtube,” 2018. [Online]. Available: https://www.youtube.com/yt/about/ press/%0D. [Accessed: 02-Mar-2018].
[2] M. Chakraborty, S. Pal, R. Pramanik, and C. Ravindranath Chowdary, “Recent developments in social spam detection and combating techniques: A survey,” Inf. Process. Manag., vol. 52, no. 6, pp. 1053–1073, Nov. 2016.
[3] A. Mehmood, B.-W. On, I. Lee, I. Ashraf, and G. Sang Choi, “Spam comments prediction using stacking with ensemble learning,” J. Phys. Conf. Ser., vol. 933, p. 012012, Jan. 2018.
[4] H. Nguyen, “Research Report 2013 State of Social Media Spam,” 2013.
[5] K. Stuart, “PewDiePie switches off YouTube comments: ‘It’s mainly spam,’” The Guardian, 2014. [Online]. Available: https://www.theguardian.com/technology/2014/sep/03/pewdiepie-switches-off-youtube-comments-its-mainly-spam. [Accessed: 02-Mar-2018].
[6] T. C. Alberto, J. V. Lochter, and T. A. Almeida, “TubeSpam: Comment Spam Filtering on YouTube,” in 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), 2015, pp. 138–143.
[7] A. Pinandito, R. S. Perdana, M. C. Saputra, and H. M. Az-zahra, “Spam detection framework for Android Twitter application using Naïve Bayes and K-Nearest Neighbor classifiers,” in Proceedings of the 6th International Conference on Software and Computer Applications - ICSCA ’17, 2017, pp. 77–82.
[8] M. Alsaleh, A. Alarifi, F. Al-Quayed, and A. Al-Salman, “Combating Comment Spam with Machine Learning Approaches,” in 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), 2015, pp. 295–300.
[9] R. M. Silva, T. C. Alberto, T. A. Almeida, and A. Yamakami, “Towards filtering undesired short text messages using an online learning approach with semantic indexing,” Expert Syst. Appl., vol. 83, pp. 314–325, Oct. 2017.
[10] T. A. Almeida, T. P. Silva, I. Santos, and J. M. Gómez Hidalgo, “Text normalization and semantic indexing to enhance Instant Messaging and SMS spam filtering,” Knowledge-Based Syst., vol. 108, pp. 25–32, Sep. 2016.
[11] I. Idris et al., “A combined negative selection algorithm–particle swarm optimization for an email spam detection system,” Eng. Appl. Artif. Intell., vol. 39, pp. 33–44, Mar. 2015.
[12] C.-N. Lee, Y.-R. Chen, and W.-G. Tzeng, “An online subject-based spam filter using natural language features,” in 2017 IEEE Conference on Dependable and Secure Computing, 2017, pp. 479–487.
[13] K. Roy, S. Keshari, and S. Giri, “Enhanced Bayesian spam filter technique employing LCS,” in 2016 International Conference on Computer, Electrical & Communication Engineering (ICCECE), 2016, pp. 1–6.
[14] M. Zavvar, M. Rezaei, and S. Garavand, “Email Spam Detection Using Combination of Particle Swarm Optimization and Artificial Neural Network and Support Vector Machine,” Int. J. Mod. Educ. Comput. Sci., vol. 8, no. 7, pp. 68–74, Jul. 2016.
[15] Q. Dang, F. Gao, and Y. Zhou, “Spammer detection based on Hidden Markov Model in micro-blogging,” in 2016 12th World Congress on Intelligent Control and Automation (WCICA), 2016, pp. 407–412.
[16] S. Sedhai and A. Sun, “Semi-Supervised Spam Detection in Twitter Stream,” IEEE Trans. Comput. Soc. Syst., pp. 1–7, 2017.
[17] T. Wu, S. Liu, J. Zhang, and Y. Xiang, “Twitter spam detection based on deep learning,” in Proceedings of the Australasian Computer Science Week Multiconference on - ACSW ’17, 2017, pp. 1–8.
[18] S. Boughorbel, F. Jarray, and M. El-Anbari, “Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric,” PLoS One, vol. 12, no. 6, p. e0177678, Jun. 2017.

Pengaruh Normalisasi Teks Dengan Text Expansion Dalam Deteksi Komentar Spam Pada Youtube

Abstract

Downloads

References

Most read articles by the same author(s)