A reinforcement learning approach for training complex decision making models.

Authors

  • Sarbaree Mishra Program Manager at Molina Healthcare Inc., USA. Author

DOI:

https://doi.org/10.63282/3050-9262.IJAIDSML-V3I3P109

Keywords:

Reinforcement learning, decision-making, intelligent systems, complex models, policy optimization, machine learning, adaptive algorithms, reward maximization, deep reinforcement learning, neural network architectures, value iteration, policy iteration, Monte Carlo methods, temporal difference learning, actor-critic algorithms, Q-learning, deep Q-networks, hierarchical reinforcement learning, multi-agent reinforcement learning, stochastic policies, environment modeling, state-action space, exploration strategies, exploitation strategies, autonomous systems, real-time decision-making, sequential learning, transfer learning, meta-learning, imitation learning, curriculum learning, optimization techniques, dynamic systems

Abstract

Reinforcement learning (RL) is a potent and significant machine learning branch that allows the systems to discover the best strategies through trial-and-error interactions with their environments, thereby making it a logical culprit for handling complex decision-making problems. Unlike traditional methods, which depend on a set of already defined rules or labeled datasets, RL rewards the models for the behavior that is desired and thus they train by themselves and further adjust to the environment by changing dynamically. Due to this feature to self-learn and thus better performance, RL is becoming more and more important in various application areas, such as robotics, video games, the financial sector, and the medical field, where intelligent systems are required to take very subtle decisions in totally different manners. The article talks about the main ideas of reinforcement learning and thus discusses how agents learn by combining exploration and exploitation. We are also introducing various popular algorithms such as Q-learning, Deep Q-Networks, and Policy Gradient methods along with their real-world usage. One of the important factors in this paper is that we present the examples related to the supply chain to show the RL revolutionary potential to train the system to solve complex decision-making problems. But the real-world scenario of RL is faced with several problems happening simultaneously, such as sample inefficiency, reward shaping, and scaling of complex solutions, just to name a few. We also suggest practical solutions to the problem, for instance, using hybrid methods, increasing the precision of the simulation of the environment, and designing perfect reward structures. Besides this, we are also talking about the importance of a combination of RL with other techniques like supervised learning and evolutionary algorithms to get better results

References

[1] Kulkarni, P. (2012). Reinforcement and systemic machine learning for decision making (Vol. 1). John Wiley & Sons.

[2] Xu, X., Zuo, L., Li, X., Qian, L., Ren, J., & Sun, Z. (2018). A reinforcement learning approach to autonomous decision making of intelligent vehicles on highways. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 50(10), 3884-3897.

[3] Nookala, G., Gade, K. R., Dulam, N., & Thumburu, S. K. R. (2021). Unified Data Architectures: Blending Data Lake, Data Warehouse, and Data Mart Architectures. MZ Computing Journal, 2(2).

[4] Manda, J. K. "Blockchain Applications in Telecom Supply Chain Management: Utilizing Blockchain Technology to Enhance Transparency and Security in Telecom Supply Chain Operations." MZ Computing Journal 2.2 (2021).

[5] Shi, H., & Xu, M. (2019). A multiple-attribute decision-making approach to reinforcement learning. IEEE Transactions on Cognitive and Developmental Systems, 12(4), 695-708.

[6] Arugula, Balkishan. “Implementing DevOps and CI CD Pipelines in Large-Scale Enterprises”. International Journal of Emerging Research in Engineering and Technology, vol. 2, no. 4, Dec. 2021, pp. 39-47

[7] Kelemen, A., Liang, Y., & Franklin, S. (2002). A comparative study of different machine learning approaches for decision making.

[8] Sai Prasad Veluru. “Optimizing Large-Scale Payment Analytics With Apache Spark and Kafka”. JOURNAL OF RECENT TRENDS IN COMPUTER SCIENCE AND ENGINEERING ( JRTCSE), vol. 7, no. 1, Mar. 2019, pp. 146–163

[9] Immaneni, J. (2021). Scaling Machine Learning in Fintech with Kubernetes. International Journal of Digital Innovation, 2(1).

[10] Wu, W., Huang, Z., Zeng, J., & Fan, K. (2021). A fast decision-making method for process planning with dynamic machining resources via deep reinforcement learning. Journal of manufacturing systems, 58, 392-411.

[11] Manda, Jeevan Kumar. "Cloud Security Best Practices for Telecom Providers: Developing comprehensive cloud security frameworks and best practices for telecom service delivery and operations, drawing on your cloud security expertise." Available at SSRN 5003526 (2020).

[12] Shortreed, S. M., Laber, E., Lizotte, D. J., Stroup, T. S., Pineau, J., & Murphy, S. A. (2011). Informing sequential clinical decision-making through reinforcement learning: an empirical study. Machine learning, 84, 109-136.

[13] Jani, Parth, and Sarbaree Mishra. "Data Mesh in Federally Funded Healthcare Networks." The Distributed Learning and Broad Applications in Scientific Research 6 (2020): 1146-1176. –dec

[14] Patel, Piyushkumar. "Bonus Depreciation Loopholes: How High-Net-Worth Individuals Maximize Tax Deductions." Distributed Learning and Broad Applications in Scientific Research 5 (2019): 1405-19.

[15] Allam, Hitesh. Exploring the Algorithms for Automatic Image Retrieval Using Sketches. Diss. Missouri Western State University, 2017.

[16] Loftus, T. J., Filiberto, A. C., Li, Y., Balch, J., Cook, A. C., Tighe, P. J., ... & Bihorac, A. (2020). Decision analysis and reinforcement learning in surgical decision-making. Surgery, 168(2), 253-266.

[17] Nookala, G. (2022). Metadata-Driven Data Models for Self-Service BI Platforms. Journal of Big Data and Smart Systems, 3(1).

[18] He, Y., Xing, L., Chen, Y., Pedrycz, W., Wang, L., & Wu, G. (2020). A generic Markov decision process model and reinforcement learning method for scheduling agile earth observation satellites. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 52(3), 1463-1474.

[19] Mohammad, Abdul Jabbar. “AI-Augmented Time Theft Detection System”. International Journal of Artificial Intelligence, Data Science, and Machine Learning, vol. 2, no. 3, Oct. 2021, pp. 30-38

[20] Rogova, G., & Kasturi, J. (2001, August). Reinforcement learning neural network for distributed decision making. In Proc. of the Forth Conf. on Information Fusion.

[21] Nookala, G. (2021). Automated Data Warehouse Optimization Using Machine Learning Algorithms. Journal of Computational Innovation, 1(1).

[22] Allam, Hitesh. "Bridging the Gap: Integrating DevOps Culture into Traditional IT Structures." International Journal of Emerging Trends in Computer Science and Information Technology 3.1 (2022): 75-85.

[23] Vasanta Kumar Tarra. “Policyholder Retention and Churn Prediction”. JOURNAL OF RECENT TRENDS IN COMPUTER SCIENCE AND ENGINEERING ( JRTCSE), vol. 10, no. 1, May 2022, pp. 89-103

[24] Jani, Parth. "Privacy-Preserving AI in Provider Portals: Leveraging Federated Learning in Compliance with HIPAA." The Distributed Learning and Broad Applications in Scientific Research 6 (2020): 1116-1145.

[25] Dayan, P., & Daw, N. D. (2008). Decision theory, reinforcement learning, and the brain. Cognitive, Affective, & Behavioral Neuroscience, 8(4), 429-453.

[26] Shaik, Babulal, and Jayaram Immaneni. "Enhanced Logging and Monitoring With Custom Metrics in Kubernetes." African Journal of Artificial Intelligence and Sustainable Development 1 (2021): 307-30.

[27] Datla, Lalith Sriram, and Rishi Krishna Thodupunuri. “Applying Formal Software Engineering Methods to Improve Java-Based Web Application Quality”. International Journal of Artificial Intelligence, Data Science, and Machine Learning, vol. 2, no. 4, Dec. 2021, pp. 18-26

[28] Pednault, E., Abe, N., & Zadrozny, B. (2002, July). Sequential cost-sensitive decision making with reinforcement learning. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 259-268).

[29] Shi, H., Lin, Z., Zhang, S., Li, X., & Hwang, K. S. (2018). An adaptive decision-making method with fuzzy Bayesian reinforcement learning for robot soccer. Information Sciences, 436, 268-281.

[30] Manda, Jeevan Kumar. "5G Network Slicing: Use Cases and Security Implications." Available at SSRN 5003611 (2021).

[31] Arugula, Balkishan, and Pavan Perala. “Building High-Performance Teams in Cross-Cultural Environments”. International Journal of Emerging Research in Engineering and Technology, vol. 3, no. 4, Dec. 2022, pp. 23-31

[32] Tsoukalas, A., Albertson, T., & Tagkopoulos, I. (2015). From data to optimal decision making: a data-driven, probabilistic machine learning approach to decision support for patients with sepsis. JMIR medical informatics, 3(1), e3445.

[33] Shaik, Babulal. "Automating Zero-Downtime Deployments in Kubernetes on Amazon EKS." Journal of AI-Assisted Scientific Discovery 1.2 (2021): 355-77.

[34] Patel, Piyushkumar. "The Role of AI in Forensic Accounting: Enhancing Fraud Detection Through Machine Learning." Distributed Learning and Broad Applications in Scientific Research 5 (2019): 1420-35.

[35] Abdul Jabbar Mohammad. “Cross-Platform Timekeeping Systems for a Multi-Generational Workforce”. American Journal of Cognitive Computing and AI Systems, vol. 5, Dec. 2021, pp. 1-22

[36] Jayatilake, S. M. D. A. C., & Ganegoda, G. U. (2021). Involvement of machine learning tools in healthcare decision making. Journal of healthcare engineering, 2021(1), 6679512.

[37] Talakola, Swetha. “Challenges in Implementing Scan and Go Technology in Point of Sale (POS) Systems”. Essex Journal of AI Ethics and Responsible Innovation, vol. 1, Aug. 2021, pp. 266-87

[38] Datla, Lalith Sriram, and Rishi Krishna Thodupunuri. “Methodological Approach to Agile Development in Startups: Applying Software Engineering Best Practices”. International Journal of AI, BigData, Computational and Management Studies, vol. 2, no. 3, Oct. 2021, pp. 34-45

[39] He, X., Fei, C., Liu, Y., Yang, K., & Ji, X. (2020, September). Multi-objective longitudinal decision-making for autonomous electric vehicle: a entropy-constrained reinforcement learning approach. In 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC) (pp. 1-6). IEEE.

[40] Sreekandan Nair, S., & Lakshmikanthan, G. (2021). Open Source Security: Managing Risk in the Wake of Log4j Vulnerability. International Journal of Emerging Trends in Computer Science and Information Technology, 2(4), 33-45. https://doi.org/10.63282/d0n0bc24

Published

2022-10-30

Issue

Section

Articles

How to Cite

1.
Mishra S. A reinforcement learning approach for training complex decision making models. IJAIDSML [Internet]. 2022 Oct. 30 [cited 2025 Oct. 30];3(3):82-9. Available from: https://ijaidsml.org/index.php/ijaidsml/article/view/215