Word2Vec Approaches in Classifying Schizophrenia Through Speech Pattern

Putri Alysia Azis; Tenriola Andi; Dewi Fatmarani Surianto; Nur Azizah Eka Budiarti; Andi Akram Nur Risal; Zulhajji Zulhajji

doi:10.29207/resti.v9i2.6323

Putri Alysia Azis Universitas Negeri Makassar
Tenriola Andi Universitas Negeri Makassar
Dewi Fatmarani Surianto Universitas Negeri Makassar https://orcid.org/0009-0003-3169-9993
Nur Azizah Eka Budiarti Universitas Negeri Makassar
Andi Akram Nur Risal Universitas Negeri Makassar
Zulhajji Zulhajji Universitas Negeri Makassar

DOI: https://doi.org/10.29207/resti.v9i2.6323

Keywords: Natural Language Processing, Schizophrenia, Speech Pattern, Word2Vec

Abstract

Schizophrenia is a chronic brain disorder characterized by symptoms such as delusions, hallucinations, and disorganized speech, posing significant challenges for accurate diagnosis. This research investigates an innovative Natural Language Processing (NLP) framework for classifying the speech patterns of schizophrenia patients using Word2Vec, with the aim of determining whether there are significant differences between the two features. The dataset comprises speech transcriptions from 121 schizophrenia patients and 121 non-schizophrenia participants collected through structured interviews. This study compares two Word2Vec architectures, Continuous Bag-of-Words (CBOW) and Skip-Gram (SG), to determine their effectiveness in classifying schizophrenia speech patterns. The results indicate that the SG architecture, with hyperparameter tuning, produces more detailed word representations, particularly for low-frequency words. This approach yields more accurate classification results, achieving an F1-score of 93.81%. These results emphasize the effectiveness of the framework in handling structured and abstract linguistic patterns. By utilizing the advantages of both static and contextual embedding, this approach offers significant potential for clinical applications, providing a reliable tool for improving schizophrenia diagnosis through automated speech analysis.

Downloads

Download data is not yet available.

References

Badan Pusat Statistik, Profil Statistik Kesehatan, vol. 7. 2023. Accessed: Aug. 14, 2024. [Online]. Available: https://www.bps.go.id/id/publication/2023/12/20/feffe5519c812d560bb131ca/profil-statistik-kesehatan-2023.html

American Psychiatric Association, Diagnostic and Statistical Manual of DSM-5TM. 2013. Accessed: Aug. 14, 2024. [Online]. Available: https://www.psychiatryonline.org/dsm

D. I. Velligan and S. Rao, “The Epidemiology and Global Burden of Schizophrenia,” 2023, Physicians Postgraduate Press Inc. doi: 10.4088/JCP.MS21078COM5.

T. Onitsuka et al., “PCN FRONTIER REVIEW PCN Toward recovery in schizophrenia: Current concepts, findings, and future research directions,” Psychiatry Clin Neurosci, vol. 76, no. 7, 2022, doi: 10.1111/pcn.13342/full.

M. Pauzi, “Hubungan Beban Sosial dengan Kemampuan Keluarga Merawat Pasien Skizofreenia Pasca Pasung di Wilayah Kabupaten Bungo-Jambi,” Jurnal Inovasi Penelitian, vol. Vol.2 No.5, 2021, Accessed: Jan. 25, 2025. [Online]. Available: https://ejournal.stpmataram.ac.id/JIP/article/view/915

A. J. McGuinness et al., “A systematic review of gut microbiota composition in observational studies of major depressive disorder, bipolar disorder and schizophrenia,” Apr. 01, 2022, Springer Nature. doi: 10.1038/s41380-022-01456-3.

J. A. Cortes-Briones, N. I. Tapia-Rivas, D. C. D’Souza, and P. A. Estevez, “Going deep into schizophrenia with artificial intelligence,” Schizophr Res, vol. 245, pp. 122–140, Jul. 2022, doi: 10.1016/j.schres.2021.05.018.

X. Chen, D. G. Chen, Z. Zhao, J. Zhan, C. Ji, and J. Chen, “Artificial image objects for classification of schizophrenia with GWAS-selected SNVs and convolutional neural network,” Patterns, vol. 2, no. 8, Aug. 2021, doi: 10.1016/j.patter.2021.100303.

X. Chen, H. Xie, and X. Tao, “Vision, status, and research topics of Natural Language Processing,” Natural Language Processing Journal, vol. 1, p. 100001, 2022, doi: 10.1016/j.nlp.2022.100001.

D. Khurana, A. Koli, K. Khatter, and S. Singh, “Natural language processing: state of the art, current trends and challenges,” Multimed Tools Appl, vol. 82, no. 3, pp. 3713–3744, Jan. 2023, doi: 10.1007/s11042-022-13428-4.

G. Di Gennaro, A. Buonanno, and F. A. N. Palmieri, “Considerations about learning Word2Vec,” Journal of Supercomputing, vol. 77, no. 11, pp. 12320–12335, Nov. 2021, doi: 10.1007/s11227-021-03743-2.

P. F. Muhammad, R. Kusumaningrum, and A. Wibowo, “Sentiment Analysis Using Word2vec and Long Short-Term Memory (LSTM) for Indonesian Hotel Reviews,” in Procedia Computer Science, Elsevier B.V., 2021, pp. 728–735. doi: 10.1016/j.procs.2021.01.061.

H. Xia, “Continuous-bag-of-words and Skip-gram for word vector training and text classification,” in Journal of Physics: Conference Series, Institute of Physics, 2023. doi: 10.1088/1742-6596/2634/1/012052.

H. Jayadianti, B. A. Arianti, N. H. Cahyana, S. Saifullah, and R. Dreżewski, “Improving sentiment analysis on PeduliLindungi comments: a comparative study with CNN-Word2Vec and integrated negation handling,” Science in Information Technology Letters, vol. 4, no. 2, pp. 75–89, Nov. 2023, doi: 10.31763/sitech.v4i2.1184.

S. Al-Saqqa, A. Awajan, and B. Hammo, “Performance Comparison of Word2Vec Models for Detecting Arabic Hate Speech on Social Networks,” in 2022 International Conference on Emerging Trends in Computing and Engineering Applications (ETCEA), IEEE, Nov. 2022, pp. 1–5. doi: 10.1109/ETCEA57049.2022.10009734.

S. C. Pereira, A. M. Mendonça, A. Campilho, P. Sousa, and C. Teixeira Lopes, “Automated image label extraction from radiology reports — A review,” Mar. 01, 2024, Elsevier B.V. doi: 10.1016/j.artmed.2024.102814.

A. E. Voppel, J. N. De Boer, S. G. Brederoo, H. G. Schnack, and I. E. C. Sommer, “Semantic and Acoustic Markers in Schizophrenia-Spectrum Disorders: A Combinatory Machine Learning Approach,” Schizophr Bull, vol. 49, pp. S163–S171, Mar. 2023, doi: 10.1093/schbul/sbac142.

F. Tsiwah, A. Mayya, and A. van Craneburgh, “Semantic-based NLP techniques discriminate schizophrenia and Wernicke’s aphasia based on spontaneous speech Tsiwah,” May 2024. Accessed: Dec. 27, 2024. [Online]. Available: https://research.rug.nl/en/publications/semantic-based-nlp-techniques-discriminate-schizophrenia-and-wern

B. TaghiBeyglou and F. Rudzicz, “Context is not key: Detecting Alzheimer’s disease with both classical and transformer-based neural language models,” Natural Language Processing Journal, vol. 6, p. 100046, Mar. 2024, doi: 10.1016/j.nlp.2023.100046.

Yuyun, A. D. Latief, T. Sampurno, Hazriani, A. O. Arisha, and Mushaf, “Next Sentence Prediction: The Impact of Preprocessing Techniques in Deep Learning,” in Proceedings - 2023 10th International Conference on Computer, Control, Informatics and its Applications: Exploring the Power of Data: Leveraging Information to Drive Digital Innovation, IC3INA 2023, Institute of Electrical and Electronics Engineers Inc., 2023, pp. 274–278. doi: 10.1109/IC3INA60834.2023.10285805.

Rianto, A. B. Mutiara, E. P. Wibowo, and P. I. Santosa, “Improving the accuracy of text classification using stemming method, a case of non-formal Indonesian conversation,” J Big Data, vol. 8, no. 1, Dec. 2021, doi: 10.1186/s40537-021-00413-1.

J. Daniel and J. H. Martin, Regular Expression, Text Normalization, Edit Distance. 2023. Accessed: Aug. 16, 2024. [Online]. Available: https://web.stanford.edu/~jurafsky/slp3/old_jan23/2.pdf

A. Chadha and B. Kaushik, “A Hybrid Deep Learning Model Using Grid Search and Cross-Validation for Effective Classification and Prediction of Suicidal Ideation from Social Network Data,” New Gener Comput, vol. 40, no. 4, pp. 889–914, Dec. 2022, doi: 10.1007/s00354-022-00191-1.

Q. Song et al., “Optimizing Word Embeddings for Patient Portal Message Datasets with a Small Number of Samples,” May 15, 2024. doi: 10.21203/rs.3.rs-4350387/v1.

S. Jaradat, R. Nayak, A. Paz, and M. Elhenawy, “Ensemble Learning with Pre-Trained Transformers for Crash Severity Classification: A Deep NLP Approach,” Algorithms, vol. 17, no. 7, Jul. 2024, doi: 10.3390/a17070284.

T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” Jan. 2013, [Online]. Available: http://arxiv.org/abs/1301.3781

L. Breiman, “Random Forests,” Netherlanda, 2001. Accessed: Aug. 16, 2024. [Online]. Available: https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf

T. Yu, W.-Z. Pei, C.-Y. Xu, C.-C. Deng, and X.-L. Zhang, “Identification of male schizophrenia patients using brain morphology based on machine learning algorithms,” World J Psychiatry, vol. 14, no. 6, pp. 804–811, Jun. 2024, doi: 10.5498/wjp.v14.i6.804.

V. R. Gashkarimov, R. I. Sultanova, I. S. Efremov, and A. Asadullin, “Machine Learning Techniques in Diagnostics and Prediction of the Clinical Features of Schizophrenia: A Narrative Review,” 2023, Eco-Vector LLC. doi: 10.17816/CP11030.

Y. T. Jo, S. W. Joo, S. H. Shon, H. Kim, Y. Kim, and J. Lee, “Diagnosing schizophrenia with network analysis and a machine learning method,” Int J Methods Psychiatr Res, vol. 29, no. 1, Mar. 2020, doi: 10.1002/mpr.1818.

C. Cortes, V. Vapnik, and L. Saitta, “Support-Vector Networks Editor,” Kluwer Academic Publishers, 1995. doi: 10.1007/BF00994018.

B. Firmanto, H. Soekotjo, and H. Suyono, “Perbandingan Kinerja Algoritma Promethee dan Topsis Untuk Pemilihan Guru Teladan,” http://jurnal.unram.ac.id/index.php/jpp-ipa, 2016, [Online]. Available: http://jurnal.unram.ac.id/index.php/jpp-ipa

A. P. Bradley, “The use of the area under the ROC curve in the evaluation of machine learning algorithms,” Pattern Recognit, vol. 30, no. 7, pp. 1145–1159, Jul. 1997, doi: 10.1016/S0031-3203(96)00142-2.

R. Kohavi, “A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection,” 1995. [Online]. Available: http//roboticsStanfordedu/"ronnyk

D. Harris-Birtill and R. Harris-Birtill, “Understanding computation time,” 2021.

A. Sabina Uban, A. Maria Cristea, A. Dinu, L. P. Dinu, S. Georgescu, and iu Zoicas, “CoToHiLi at LSCDiscovery: the Role of Linguistic Features in Predicting Semantic Change,” 2022. [Online]. Available: https://github.com/artetxem/vecmap

T. P. Adewumi, F. Liwicki, and M. Liwicki, “Word2Vec: Optimal Hyper-Parameters and Their Impact on NLP Downstream Tasks,” Mar. 2020, [Online]. Available: http://arxiv.org/abs/2003.11645

Word2Vec Approaches in Classifying Schizophrenia Through Speech Pattern

Abstract

Downloads

References

Most read articles by the same author(s)