The Public Bus Complaint Classification by Deep Learning Model
DOI:
https://doi.org/10.14456/jiskku.2022.2Keywords:
Text Classification, Machine Learning, Deep Learning, Public BusAbstract
Purpose: The objectives of this study were to design and develop the public bus complaint classification using the deep learning approach and to assess the complaint classification accuracy.
Methodology: This investigation began with the design and development of public bus complaint classification using deep learning approach to create a binary model for dividing words by deep-cut algorithm. After that, all words were transformed into a word bag for word indexing which could calculate keyword weights for classifying statements to mitigate the data duplication problem with t-SNE method in order to make distribution of similar words for formulating 7 classes of public bus complaint classification, including driving class, shuttle stop class, service provider class, bus schedule operation class, vehicle class, service equipment supply class, and epidemic prevention class.
Findings: The findings revealed that all classes were found to be accurate at 90%. However, the service provider class which received the largest number of the public bus complaints achieved its accuracy only 83%. This resulted in a variety of different information based on the user context. The comparison between the deep-cut algorithm and the fast-text algorithm showed that they both attained the high accuracy in the same direction. When measuring the accuracy of the service provider problem tagging, it was found that the accuracy reached good level. This indicated that the public bus complaint classification with deep-cut learning approach provided accurate results, especially the accuracy of one complaint per one issue due to the suitability of the issue terms which could be defined by the context. On the contrary, many problems found in the study were caused by duplication of terms in some classes. For instance, ‘loudness’ could be attributed to loud noise from drivers in the driving class or from the engine sound in the vehicle class or in the equipment supply class. Therefore, the term accuracy had to be semantically and operationally measured.
Application of the study: The study findings are of benefit to responsible individuals for improving their services to meet their user service needs.
Downloads
References
Aggarwal, C. C. (2015). Data classification algorithms and applications. Boca Raton, FL, USA: CRC Press.
Albon, C. (2018). Machine learning with python cookbook practical solutions from pre-processing to deep learning. Sebastopol, CA, USA: O'Reilly Media Inc.
Bangkok Mass Transit Authority. (2019). Annual report 2019. (In Thai), Bangkok: Bangkok Mass Transit Authority, Bangkok.
Bhattacharjee, J. (2018). Fast text quick start guide get started with facebook's library for text representation and classification. Birmingham, UK: Packt Publishing.
Jo, T., (2019). Text mining concepts, implementation, and big data challenge. Cham, Switzerland: Springer International Publishing.
Kleinbaum, D. G. & Klein, M. (2010). Logistic regression a self‐learning text. 3rd ed. New York, USA: Springer Science+Business Media LLC.
Manning, C.D., Raghavan, P., & Schütze, H., (2008). Introduction to information retrieval. Cambridge, UK: Cambridge University Press.
McConnell, S. (1996). Rapid development: taming wild software schedules. Washington, DC, USA: Microsoft Press.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. proceedings of the 26th international conference on neural information processing systems. (3111-3119). New York, USA: Curran Associates.
Mikolov, T., Corrado, G., Chen, K. & Dean, J. (2013). Efficient estimation of word representa-tions in vector space. proceedings of the international conference on learning representations. (1-12). Scottsdale, Arizona, USA: arXiv.
Mitchell, R. (2015). Web scraping with python: collecting data from the modern web. Sebastopol, CA, USA: O'Reilly Media Inc.
Office of Transport and Traffic Policy and Planning. (2018). The number of passengers on the BMTA bus. (In Thai). Retrieved 20 August 2021, from http://mistran.otp.go.th/mis/Interview_HIPubilcBus.aspx.
Raschka, S. & Mirjalili, V. (2017). Python machine learning machine learning and deep learning with python, scikit-learn, and tensorflow. 2nd ed. Birmingham, UK: Packt Publishing.
Rosebrock, A. (2017). Deep learning for computer vision with python: starter bundle. n.p., USA: PyImageSearch.
Scikit-learn developers (BSD License). (2020). Sklearn. manifold.TSNE. from https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html. Retrieved 5 August 2021.
Tapsai, C., Unger, H., & Meesad, P., (2021). Thai natural language processing word segmentation, semantic analysis, and application. Cham, Switzerland: Springer International Publishing.
van der Maate, L. & Hinto, H. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579-2605.
Vasilev, I., Slater, D., Spacagna, G., Roelants, P. & Zocca, V. (2019). Python deep learning exploring deep learning techniques and neural network architectures with pytorch, keras, and tensorflow. 2nd ed. Birmingham, UK: Packt Publishing.
Verma, A. & Ramanayya, T.V. (2015). Public transport planning and management in developing countries. Boca Raton, FL, USA: CRC Press.
Wozniak, M. (2014). Hybrid classifiers methods of data, knowledge, and classifier combination. Heidelberg, Germany: Springer-Verlag.
Zhai, C. & Massung, S. (2016). Text data management and analysis: a practical introduction to information retrieval and text mining. New York, USA: ACM Books.
Zizka, J., Darena, F., & Svoboda, A., (2020). Text mining with machine learning principles and techniques. Boca Raton, FL, USA: CRC Press.
Zong, C., Xia, R. & Zhang, J. (2021). Text data mining. Tsinghua, Beijing, China: Tsinghua University Press.