Identifying Text-based Online Thai Hate Speech in Social Media
DOI:
https://doi.org/10.14456/jiskku.2024.27Keywords:
Online hate speech, Classification, Social network service, Keyword detection, Text miningAbstract
Purpose: This work proposes a method to detect Thai online hate speech which can be categorized to 5 types, including ethnic-based, gender-based, ableism, belief-based, and social status-based hate speech. Online comments from famous social network services in Thailand are collected and annotated for training data.
Methodology: Machine learning approaches are employed to perform multiclass classification for identifying the hate speech. Moreover, we exploit the information gain score to determine which terms are significant to relay hateful intent of each hate speech class.
Findings: The results of hate speech detection reveal that a language model of combining TF-IDF and trigram using with SVM technique obtained the best performance in detection for 0.76 F-measure score in average. The use of IG score also provides a list of significant terms that related to a specific hate speech class.
Applications of this study: Hate speech detection helps to analyze Thai text messages that may be hurtful to recipients. It can actively filter and disallow the message before posting to prevent online cyber bullies in social media platforms, and it reminds users who may unintentionally choose Thai risky words that may cause emotional wound to readers.
Downloads
References
ALBayari, R., Abdullah, S., & Salloum, S. A. (2021). Cyberbullying classification methods for Arabic: A systematic review. Proceedings of the International Conference on Artificial Intelligence and Computer Vision. (375-385). Settat, Morocco: Springer International Publishing.
Dang, S., & Ahmad, P. H. (2014). Text mining: Techniques and its application. International Journal of Engineering & Technology Innovations, 1(4), 22-25.
Haruechaiyasak, C., & Kongthon, A. (2013). LexToPlus: A Thai lexeme tokenization and normalization tool. Proceedings of the 4th Workshop on South and Southeast Asian Natural Language Processing. (9-16). Nagoya, Japan.
Islam, M. M., Uddin, M. A., Islam, L., Akter, A., Sharmin, S., & Acharjee, U. K. (2020). Cyberbullying detection on social networks using machine learning approaches. Proceedings of 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE). (1-6). Gold Coast, Australia: IEEE.
Kanan, T., Aldaaja, A., & Hawashin, B. (2020). Cyber-bullying and cyber-harassment detection using supervised machine learning techniques in Arabic social media contents. Journal of Internet Technology, 21(5), 1409-1421. https://doi.org/10.3966/160792642020092105016
Kusal, S., Patil, S., Kotecha, K., Aluvalu, R., & Varadarajan, V. (2021). AI based emotion detection for textual big data: Techniques and contribution. Big Data and Cognitive Computing, 5(3), 43. https://doi.org/10.3390/bdcc5030043
Melander, L. A. (2010). College students' perceptions of intimate partner cyber harassment. Cyberpsychology, Behavior, and Social Networking, 13(3), 263-268. https://doi.org/10.1089/cyber.2009.02
Milosevic, T., Van Royen, K., & Davis, B. (2022). Artificial intelligence to address cyberbullying, harassment and abuse: new directions in the midst of complexity. International journal of bullying prevention, 4(1), 1-5. https://doi.org/10.1007/s42380-022-00117-x
Neelakandan, S., Sridevi, M., Chandrasekaran, S., Murugeswari, K., Pundir, A. K. S., Sridevi, R., & Lingaiah, T. B. (2022). Deep learning approaches for cyberbullying detection and classification on social media. Computational Intelligence and Neuroscience, 2022, 1-13. https://doi.org/10.1155/2022/2163458
Willard, N. E. (2007). Cyberbullying and cyberthreats: Responding to the challenge of online social aggression, threats, and distress. Champaign: Research Press.
Xiong, Z., Yan, Z., Yao, H., & Liang, S. (2022). Design demand trend acquisition method based on short text mining of user comments in shopping websites. Information, 13(3), 110. https://doi.org/10.3390/info13030110
Yuvaraj, N., Chang, V., Gobinathan, B., Pinagapani, A., Kannan, S., Dhiman, G., & Rajan, A. R. (2021). Automatic detection of cyberbullying using multi-feature based artificial intelligence with deep decision tree classification. Computers & Electrical Engineering, 92, 107186. https://doi.org/10.1016/j.compeleceng.2021.107186