Predictive Modeling for Classification of SMS Spam Using NLP and ML Techniques
DOI:
https://doi.org/10.63282/3050-9262.IJAIDSML-V2I4P107Keywords:
SMS Detection, Naïve Bayes, Spam Detection, Natural Language Processing, SMS Spam Collection Dataset, SVM, Machine Learning, KNN, Random ForestAbstract
Modern telecommunication systems have exposed users and service providers to complex forms of fraudulent communications via SMS spam, resulting in serious disruptions by sending unwanted messages, phishing attempts, and financial scams to millions of users worldwide. The SMS Spam Collection dataset (5,574 messages, 87.37% legitimate and 12.63% spam) is used to classify SMS spam in this study, which extensively evaluates NLP and ML techniques. It addresses the critical challenge of finding effective and precise detection methods for increasingly sophisticated spam. Conventional keyword-based filtering techniques struggle to manage linguistic variations and evolving spam profiles, necessitating more advanced computational approaches. An extensive ML model was developed, incorporating text preprocessing, systematic feature extraction through TF-IDF vectorization, and robust Support Vector Machine (SVM) classification trained on stratified 80-20 data splits with hyperparameter tuning. The system efficiently converts text input into numerical features by performing stemming, tokenization, and punctuation removal. The SVM model achieved 97.85% accuracy, outperforming Naive Bayes (93.9%), KNN (92.26%), and Random Forest (95.46%) in distinguishing spam from legitimate messages. These results demonstrate that SVM-based NLP techniques provide an accurate, scalable, and practical solution for improving telecommunications security and enhancing user experience in modern messaging systems
References
[1] L. N. Lota and B. M. M. Hossain, “A Systematic Literature Review on SMS Spam Detection Techniques,” Int. J. Inf. Technol. Comput. Sci., vol. 9, no. 7, pp. 42–50, Jul. 2017, doi: 10.5815/ijitcs.2017.07.05.
[2] T. A. Almeida, J. M. G. Hidalgo, and A. Yamakami, “Contributions to the study of SMS spam filtering: new collection and results,” in Proceedings of the 11th ACM Symposium on Document Engineering, in DocEng ’11. New York, NY, USA: Association for Computing Machinery, 2011, pp. 259–262. doi: 10.1145/2034691.2034742.
[3] H. Sajedi, G. Z. Parast, and F. Akbari, “SMS Spam Filtering Using Machine Learning Techniques : A Survey,” Mach. Learn. Res., vol. 1, no. 1, pp. 1–14, 2016, doi: 10.11648/j.mlr.20160101.11.
[4] S. M. Abdulhamid et al., “A Review on Mobile SMS Spam Filtering Techniques,” IEEE Access, vol. 5, pp. 15650–15666, 2017, doi: 10.1109/ACCESS.2017.2666785.
[5] P. Pathak, A. Shrivastava, and S. Gupta, “A survey on various security issues in delay tolerant networks,” J Adv Shell Program., vol. 2, no. 2, pp. 12–18, 2015.
[6] S. S. S. Neeli, “Serverless Databases : A Cost-Effective and Scalable Solution,” IJIRMPS, vol. 7, no. 6, 2019.
[7] S. J. Delany, M. Buckley, and D. Greene, “SMS spam filtering: Methods and data,” Expert Syst. Appl., vol. 39, no. 10, pp. 9899–9908, Aug. 2012, doi: 10.1016/j.eswa.2012.02.053.
[8] J. M. G. Hidalgo, G. C. Bringas, E. P. Sánz, and F. C. García, “Content based SMS Spam Filtering,” in roceedings of the 2006 ACM symposium on Document Engineering, in DocEng ’06. New York, NY, USA, NY, USA: ACM, Oct. 2006, pp. 107–114. doi: 10.1145/1166160.1166191.
[9] P. Sethi, V. Bhandari, and B. Kohli, “SMS spam detection and comparison of various machine learning algorithms,” in 2017 International Conference on Computing and Communication Technologies for Smart Nation (IC3TSN), 2017, pp. 28–31. doi: 10.1109/IC3TSN.2017.8284445.
[10] A. I. Taloba and S. S. I. Ismail, “An Intelligent Hybrid Technique of Decision Tree and Genetic Algorithm for E-Mail Spam Detection,” in 2019 Ninth International Conference on Intelligent Computing and Information Systems (ICICIS), IEEE, Dec. 2019, pp. 99–104. doi: 10.1109/ICICIS46948.2019.9014756.
[11] H. H. Mansoor and S. H. Shaker, “Using classification techniques to SMS spam filter,” Int. J. Innov. Technol. Explor. Eng., vol. 8, no. 12, pp. 1734–1739, 2019, doi: 10.35940/ijitee.L3206.1081219.
[12] A. Alzahrani and D. B. Rawat, “Comparative Study of Machine Learning Algorithms for SMS Spam Detection,” in 2019 SoutheastCon, 2019, pp. 1–6. doi: 10.1109/SoutheastCon42311.2019.9020530.
[13] P. Navaney, G. Dubey, and A. Rana, “SMS Spam Filtering Using Supervised Machine Learning Algorithms,” in 2018 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence), 2018, pp. 43–48. doi: 10.1109/CONFLUENCE.2018.8442564.
[14] N. Choudhary and A. K. Jain, “Towards Filtering of SMS Spam Messages Using Machine Learning Based Technique,” in Advanced Informatics for Computing Research, D. Singh, B. Raman, A. K. Luhach, and P. Lingras, Eds., Singapore: Springer Singapore, 2017, pp. 18–30.
[15] D. Suleiman and G. Al-naymat, “SMS Spam Detection using H2O Framework,” Procedia Comput. Sci., vol. 113, pp. 154–161, 2017, doi: 10.1016/j.procs.2017.08.335.
[16] N. Hussain, H. Turab Mirza, G. Rasool, I. Hussain, and M. Kaleem, “Spam Review Detection Techniques: A Systematic Literature Review,” Appl. Sci., vol. 9, no. 5, 2019, doi: 10.3390/app9050987.
[17] S. Gheewala and R. Patel, “Machine Learning Based Twitter Spam Account Detection: A Review,” in 2018 Second International Conference on Computing Methodologies and Communication (ICCMC), 2018, pp. 79–84. doi: 10.1109/ICCMC.2018.8487992.
[18] H. Raj, Y. Weihong, S. K. Banbhrani, and S. P. Dino, “LSTM Based Short Message Service (SMS) Modeling for Spam Classification,” in Proceedings of the 2018 International Conference on Machine Learning Technologies, New York, NY, USA: ACM, May 2018, pp. 76–80. doi: 10.1145/3231884.3231895.
[19] A. Tekerek, “Support Vector Machine Based Spam SMS Detection,” Politek. Derg., vol. 22, no. 3, pp. 779–784, Sep. 2019, doi: 10.2339/politeknik.429707.










