Data Clustering for Sentiment Classification with Naïve Bayes and Support Vector Machine

  • Bayu Yanuargi Universitas Amikom Yogyakarta
  • Ema Utami Universitas AMIKOM Yogyakarta
  • Kusrini Universitas AMIKOM Yogyakarta
  • Arli Aditya Parikesit Indonesia International Institute for Life Sciences
Keywords: sentiment analysis, hotel, clustering, naïve bayes, support vector machine

Abstract

Visitor reviews play a crucial role in determining the success of a business, particularly those offering hospitality and services, such as hotels. The growth of internet technology has made it easier for guests to share their experiences, which can influence potential customers. Google Maps is one of the platforms used for giving and searching reviews This research uses data crawled from Google Maps Review using the playwright library. However, the large volume of reviews can make analysis and topic-based categorization—such as service quality, hotel location, and operational hours—challenging. To address this, DBSCAN is used to cluster reviews based on these topics. Clustering helps improve sentiment classification, making it more targeted and allowing a comparison of two machine learning algorithms: Naïve Bayes and Support Vector Machine (SVM). Naïve Bayes achieved higher accuracy (0.87) in the operational hours cluster, while SVM scored 0.78. However, SVM showed improved accuracy in the location (0.89) and service (0.88) clusters, with Naïve Bayes maintaining a stable 0.86 accuracy in both. Both models demonstrated an average training time of less than one second, excluding preprocessing.

Downloads

Download data is not yet available.

References

M. Khatoon and W. A. Banu, “Unsupervised algorithms comparison in the perspective of community detection from social networks,” in 2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA), 2021, pp. 391–395. doi: 10.1109/ICIRCA51532.2021.9544555.

L. Ma, “An improved and heuristic-based iterative DBSCAN clustering algorithm,” in 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), 2021, pp. 2709–2714. doi: 10.1109/IAEAC50856.2021.9390918.

N. P. Sutramiani, I. M. T. Arthana, P. F. Lampung, S. Aurelia, M. Fauzi, and I. W. A. S. Darma, “The Performance Comparison of DBSCAN and K-Means Clustering for MSMEs Grouping based on Asset Value and Turnover,” Journal of Information Systems Engineering and Business Intelligence, vol. 10, no. 1, pp. 13–24, 2024, doi: 10.20473/jisebi.10.1.13-24.

F. Andriyani and Y. Puspitarani, “Performance Comparison of K-Means and DBScan Algorithms for Text Clustering Product Reviews,” SinkrOn, vol. 7, no. 3, pp. 944–949, Jul. 2022, doi: 10.33395/sinkron.v7i3.11569.

Y. Lu, Y. Huang, H. Yu, and Y. Lan, “Research on consumer service quality based on hotel online reviews,” in 2022 5th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE), 2022, pp. 836–840. doi: 10.1109/AEMCSE55572.2022.00168.

H. Adiningtyas and H. Millanyani, “Analysis of Customer Satisfaction Levels in Five-Star Hotels Based on Online Customer Reviews,” in 2024 2nd International Conference on Software Engineering and Information Technology (ICoSEIT), 2024, pp. 167–174. doi: 10.1109/ICoSEIT60086.2024.10497518.

A. Abraham et al., “Naïve Bayes Approach for Word Sense Disambiguation System with a Focus on Parts-of-Speech Ambiguity Resolution,” IEEE Access, 2024, doi: 10.1109/ACCESS.2024.3453912.

A. R. Isnain, N. S. Marga, and D. Alita, “Sentiment Analysis Of Government Policy On Corona Case Using Naive Bayes Algorithm,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 15, no. 1, p. 55, Jan. 2021, doi: 10.22146/ijccs.60718.

B. Yanuargi, “Analisis sentimen terhadap aplikasi Bukalapak sebelum dan sesudah IPO menggunakan algoritma Naïve Bayes,” 2022, doi: 10.36802/jnanaloka.v3-no1-17-25.

I. R. Hendrawan, E. Utami, and A. D. Hartanto, “Comparison of Naïve Bayes Algorithm and XGBoost on Local Product Review Text Classification,” Edumatic: Jurnal Pendidikan Informatika, vol. 6, no. 1, pp. 143–149, Jun. 2022, doi: 10.29408/edumatic.v6i1.5613.

F. Fitriana, E. Utami, and H. Al Fatta, “Analisis Sentimen Opini Terhadap Vaksin Covid - 19 pada Media Sosial Twitter Menggunakan Support Vector Machine dan Naive Bayes,” Jurnal Komtika (Komputasi dan Informatika), vol. 5, no. 1, pp. 19–25, Jul. 2021, doi: 10.31603/komtika.v5i1.5185.

C. K. Wang, “Sentiment Analysis Using Support Vector Machines, Neural Networks, and Random Forests,” 2023, pp. 23–34. doi: 10.2991/978-94-6463-300-9_4.

S. Riadi, E. Utami, and A. Yaqin, “Comparison of NB and SVM in Sentiment Analysis of Cyberbullying using Feature Selection,” sinkron, vol. 8, no. 4, pp. 2414–2424, Oct. 2023, doi: 10.33395/sinkron.v8i4.12629.

L. S. Parvatha, D. Naga Veera Tarun, M. Yeswanth, and Jonnalagadda. S. Kiran, “Stock Market Prediction Using Sentiment Analysis and Incremental Clustering Approaches,” in 2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS), 2023, pp. 888–893. doi: 10.1109/ICACCS57279.2023.10112768.

M. Ula, Tsania Asha Fadilah Daulay, Richki Hardi, Sujacka Retno, Angga Pratama, and Ilham Sahputra, “Density Based Spatial Clustering of Applications and Spatial Pattern Analysis In Mapping the Distribution of ISPA Disease in Bireuen Regency,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 7, no. 3, pp. 733–742, Jun. 2023, doi: 10.29207/resti.v7i3.4936.

F. Alattar and K. Shaalan, “A Survey on Opinion Reason Mining and Interpreting Sentiment Variations,” IEEE Access, vol. 9, pp. 39636–39655, 2021, doi: 10.1109/ACCESS.2021.3063921.

F. Alghifari and D. Juardi, “Fauzan Alghifari Penerapan Data Mining Pada Penerapan Data Mining Pada Penjualan Makanan Dan Minuman Menggunakan Metode Algoritma Naïve Bayes,” 2021.

Y. Kustiyahningsih and Y. Permana, “Penggunaan Latent Dirichlet Allocation (LDA) dan Support-Vector Machine (SVM) Untuk Menganalisis Sentimen Berdasarkan Aspek Dalam Ulasan Aplikasi EdLink,” Teknika, vol. 13, no. 1, pp. 127–136, Mar. 2024, doi: 10.34148/teknika.v13i1.746.

S. Riadi, E. Utami, and A. Yaqin, “Comparison of NB and SVM in Sentiment Analysis of Cyberbullying using Feature Selection,” sinkron, vol. 8, no. 4, pp. 2414–2424, Oct. 2023, doi: 10.33395/sinkron.v8i4.12629.

I. Kurniawan et al., “Perbandingan Algoritma Naive Bayes Dan SVM Dalam Sentimen Analisis Marketplace Pada Twitter,” Jurnal Teknik Informatika dan Sistem Informasi, vol. 10, no. 1, 2023, [Online]. Available: http://jurnal.mdp.ac.id

N. Nurhasanah, D. E. Sumarly, J. Pratama, I. T. K. Heng, and E. Irwansyah, “Comparing SVM and Naïve Bayes Classifier for Fake News Detection,” Engineering, MAthematics and Computer Science (EMACS) Journal, vol. 4, no. 3, pp. 103–107, Sep. 2022, doi: 10.21512/emacsjournal.v4i3.8670.

S. Riadi, E. Utami, and A. Yaqin, “Comparison of NB and SVM in Sentiment Analysis of Cyberbullying using Feature Selection,” sinkron, vol. 8, no. 4, pp. 2414–2424, Oct. 2023, doi: 10.33395/sinkron.v8i4.12629.

Published
2024-12-28
How to Cite
Yanuargi, B., Ema Utami, Kusrini, & Parikesit, A. A. (2024). Data Clustering for Sentiment Classification with Naïve Bayes and Support Vector Machine. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 8(6), 819 - 827. https://doi.org/10.29207/resti.v8i6.6139
Section
Information Technology Articles

Most read articles by the same author(s)