Cloud-Native Reliability: Applying SRE to Serverless and Event-Driven Architectures

Authors

  • Hitesh Allam Software Engineer at Concor IT, USA. Author

DOI:

https://doi.org/10.63282/3050-9262.IJAIDSML-V5I3P108

Keywords:

Site Reliability Engineering (SRE), Cloud-native, Serverless computing, Event-driven architecture, Observability, SLIs, SLOs, SLAs, Error budgets, Chaos engineering, Reliability automation, Distributed systems

Abstract

As cloud-native technologies alter the operation and construction of contemporary applications to guarantee that these systems are durable, scalable, and financially efficient, enterprises more and more are embracing Site Reliability Engineering (SRE). This article contrasts with conventional monolithic or microservices-based systems by exploring how key SRE ideas automation, observability, error budgets, and service-level goals (SLOs) might be applied to serverless and event-driven architectures. While serverless platforms and asynchronous event-driven architectures also present fresh challenges for reliability engineering including limited visibility, complex event flows, and difficulty in incident detection and rollback, they offer great benefits including reduced operational overhead, scalability, and accelerated time-to--market. The essay demonstrates how modern teams are employing SRE techniques such as distributed tracing, proactive alerting, chaotic engineering, and infrastructure-as-code in very dynamic, ephemeral computing environments in pragmatic ways. The paper presents a case study of a fintech company that moved from containerized workloads to an event-driven serverless architecture outlining their redefined reliability objectives, integration of observability at each function and event trigger, and automation of resilience testing across distributed services. Important findings reveal that although conventional SRE indicators remain relevant, they should be interpreted differently in transitory circumstances and success depends on collaboration across development, operations, and platform teams. Using SRE in serverless and event-driven systems not only improves system dependability but also promotes a culture of accountability and continuous development qualities absolutely essential for success in the new cloud-native environment

References

[1] Raj, Pethuru, Skylab Vanga, and Akshita Chaudhary. Cloud-Native Computing: How to design, develop, and secure microservices and event-driven applications. John Wiley & Sons, 2022.

[2] Henning, Sören. Scalability benchmarking of cloud-native applications applied to event-driven microservices. Diss. 2023.

[3] Vasanta Kumar Tarra, and Arun Kumar Mittapelly. “Data Privacy and Compliance in AI-Powered CRM Systems: Ensuring GDPR, CCPA, and Other Regulations Are Met While Leveraging AI in Salesforce”. Essex Journal of AI Ethics and Responsible Innovation, vol. 4, Mar. 2024, pp. 102-28

[4] Chelliah, Pethuru Raj, Shreyash Naithani, and Shailender Singh. Practical Site Reliability Engineering: Automate the process of designing, developing, and delivering highly reliable apps and services with SRE. Packt Publishing Ltd, 2018.

[5] Yasodhara Varma. “Modernizing Data Infrastructure: Migrating Hadoop Workloads to AWS for Scalability and Performance”. Newark Journal of Human-Centric AI and Robotics Interaction, vol. 4, May 2024, pp. 123-45

[6] Veluru, Sai Prasad. "Streaming Data Pipelines for AI at the Edge: Architecting for Real-Time Intelligence." International Journal of Artificial Intelligence, Data Science, and Machine Learning 3.2 (2022): 60-68.

[7] Chaganti, Krishna C. "Advancing AI-Driven Threat Detection in IoT Ecosystems: Addressing Scalability, Resource Constraints, and Real-Time Adaptability.

[8] Safeer, C. M. Architecting Cloud-Native Serverless Solutions: Design, build, and operate serverless solutions on cloud and open source platforms. Packt Publishing Ltd, 2023.

[9] Lalith Sriram Datla, and Samardh Sai Malay. “Data-Driven Cloud Cost Optimization: Building Dashboards That Actually Influence Engineering Behavior”. Los Angeles Journal of Intelligent Systems and Pattern Recognition, vol. 4, Feb. 2024, pp. 254-76

[10] Syed, Ali Asghar Mehdi. “Networking Automation With Ansible and AI: How Automation Can Enhance Network Security and Efficiency”. Los Angeles Journal of Intelligent Systems and Pattern Recognition, vol. 3, Apr. 2023, pp. 286-0

[11] Vasanta Kumar Tarra. “Claims Processing & Fraud Detection With AI in Salesforce”. JOURNAL OF RECENT TRENDS IN COMPUTER SCIENCE AND ENGINEERING ( JRTCSE), vol. 11, no. 2, Oct. 2023, pp. 37–53

[12] Sangeeta Anand, and Sumeet Sharma. “Temporal Data Analysis of Encounter Patterns to Predict High-Risk Patients in Medicaid”. American Journal of Autonomous Systems and Robotics Engineering, vol. 1, Mar. 2021, pp. 332-57

[13] Björnberg, Adam. "Cloud native chaos engineering for IoT systems." (2021).

[14] Jani, Parth. "FHIR-to-Snowflake: Building Interoperable Healthcare Lakehouses Across State Exchanges." International Journal of Emerging Research in Engineering and Technology 4.3 (2023): 44-52.

[15] Arugula, Balkishan, and Pavan Perala. “Building High-Performance Teams in Cross-Cultural Environments”. International Journal of Emerging Research in Engineering and Technology, vol. 3, no. 4, Dec. 2022, pp. 23-31

[16] Veluru, Sai Prasad, and Swetha Talakola. “Edge-Optimized Data Pipelines: Engineering for Low-Latency AI Processing”. Newark Journal of Human-Centric AI and Robotics Interaction, vol. 1, Apr. 2021, pp. 132-5

[17] Peter, Harry. "Serverless Computing: Benefits, Limitations, and Use Cases." (2021).

[18] Datla, Lalith Sriram. “Infrastructure That Scales Itself: How We Used DevOps to Support Rapid Growth in Insurance Products for Schools and Hospitals”. International Journal of AI, BigData, Computational and Management Studies, vol. 3, no. 1, Mar. 2022, pp. 56-65

[19] Vasanta Kumar Tarra, and Arun Kumar Mittapelly. “AI-Powered Workflow Automation in Salesforce: How Machine Learning Optimizes Internal Business Processes and Reduces Manual Effort”. Los Angeles Journal of Intelligent Systems and Pattern Recognition, vol. 3, Apr. 2023, pp. 149-71

[20] Mohammad, Abdul Jabbar. “Predictive Compliance Radar Using Temporal-AI Fusion”. International Journal of AI, BigData, Computational and Management Studies, vol. 4, no. 1, Mar. 2023, pp. 76-87

[21] Vaughan, Daniel. Cloud Native Development with Google Cloud. " O'Reilly Media, Inc.", 2023.

[22] Sangaraju, Varun Varma. "AI-Augmented Test Automation: Leveraging Selenium, Cucumber, and Cypress for Scalable Testing." International Journal of Science And Engineering 7 (2021): 59-68

[23] Veluru, Sai Prasad. "Leveraging AI and ML for Automated Incident Resolution in Cloud Infrastructure." International Journal of Artificial Intelligence, Data Science, and Machine Learning 2.2 (2021): 51-61.

[24] Chaganti, Krishna. "Adversarial Attacks on AI-driven Cybersecurity Systems: A Taxonomy and Defense Strategies." Authorea Preprints.

[25] Arugula, Balkishan, and Sudhkar Gade. “Cross-Border Banking Technology Integration: Overcoming Regulatory and Technical Challenges”. International Journal of Emerging Research in Engineering and Technology, vol. 1, no. 1, Mar. 2020, pp. 40-48

[26] Kumar, Tambi Varun. "Event-Driven App Design for High-Concurrency Microservices." (2018).

[27] Atluri, Anusha, and Vijay Reddy. “Total Rewards Transformation: Exploring Oracle HCM’s Next-Level Compensation Modules”. International Journal of Emerging Research in Engineering and Technology, vol. 4, no. 1, Mar. 2023, pp. 45-53

[28] Kupunarapu, Sujith Kumar. "AI-Enhanced Rail Network Optimization: Dynamic Route Planning and Traffic Flow Management." International Journal of Science And Engineering 7.3 (2021): 87-95.

[29] Paidy, Pavan, and Krishna Chaganti. “Securing AI-Driven APIs: Authentication and Abuse Prevention”. International Journal of Emerging Research in Engineering and Technology, vol. 5, no. 1, Mar. 2024, pp. 27-37

[30] Domingus, Justin, and John Arundel. Cloud Native DevOps with Kubernetes. " O'Reilly Media, Inc.", 2022.

[31] Jani, Parth. "Predicting Eligibility Gaps in CHIP Using BigQuery ML and Snowflake External Functions." International Journal of Emerging Trends in Computer Science and Information Technology 3.2 (2022): 42-52.

[32] Talakola, Swetha. “Automated End to End Testing With Playwright for React Applications”. International Journal of Emerging Research in Engineering and Technology, vol. 5, no. 1, Mar. 2024, pp. 38-47

[33] Moreno, Sebastian. Google Cloud Certified Professional Cloud Developer Exam Guide: Modernize your applications using cloud-native services and best practices. Packt Publishing Ltd, 2021.

[34] Balkishan Arugula. “AI-Driven Fraud Detection in Digital Banking: Architecture, Implementation, and Results”. European Journal of Quantum Computing and Intelligent Agents, vol. 7, Jan. 2023, pp. 13-41

[35] Abdul Jabbar Mohammad, and Seshagiri Nageneini. “Blockchain-Based Timekeeping for Transparent, Tamper-Proof Labor Records”. European Journal of Quantum Computing and Intelligent Agents, vol. 6, Dec. 2022, pp. 1-27

[36] Emily, Harris, and Bennett Oliver. "Event-Driven Architectures in Modern Systems: Designing Scalable, Resilient, and Real-Time Solutions." International Journal of Trend in Scientific Research and Development 4.6 (2020): 1958-1976.

[37] Paidy, Pavan. “Adaptive Application Security Testing With AI Automation”. International Journal of AI, BigData, Computational and Management Studies, vol. 4, no. 1, Mar. 2023, pp. 55-63

[38] Datla, Lalith Sriram. “Proactive Application Monitoring for Insurance Platforms: How AppDynamics Improved Our Response Times”. International Journal of Emerging Research in Engineering and Technology, vol. 4, no. 1, Mar. 2023, pp. 54-65

[39] Talakola, Swetha, and Sai Prasad Veluru. “Managing Authentication in REST Assured OAuth, JWT and More”. International Journal of Emerging Trends in Computer Science and Information Technology, vol. 4, no. 4, Dec. 2023, pp. 66-75

[40] Witte, Philipp A., et al. "An event-driven approach to serverless seismic imaging in the cloud." IEEE Transactions on Parallel and Distributed Systems 31.9 (2020): 2032-2049.

[41] Chaganti, Krishna C. "Leveraging Generative AI for Proactive Threat Intelligence: Opportunities and Risks." Authorea Preprints.

[42] Kupunarapu, Sujith Kumar. "AI-Driven Crew Scheduling and Workforce Management for Improved Railroad Efficiency." International Journal of Science And Engineering 8.3 (2022): 30-37.

[43] Martens, Alexis. "Evaluation of a FaaS serverless architecture for." (2022).

[44] Jani, Parth. "Real-Time Streaming AI in Claims Adjudication for High-Volume TPA Workloads." International Journal of Artificial Intelligence, Data Science, and Machine Learning 4.3 (2023): 41-49.

[45] Mohammad, Abdul Jabbar, and Waheed Mohammad A. Hadi. “Time-Bounded Knowledge Drift Tracker”. International Journal of Artificial Intelligence, Data Science, and Machine Learning, vol. 2, no. 2, June 2021, pp. 62-71

[46] Talakola, Swetha, and Abdul Jabbar Mohammad. “Microsoft Power BI Monitoring Using APIs for Automation”. American Journal of Data Science and Artificial Intelligence Innovations, vol. 3, Mar. 2023, pp. 171-94

[47] Deep, Venkata Thej. "AI-Driven" Immunological" Drift Detection in Serverless Workflows." J. Electrical Systems 19.1 (2023): 42-54.

[48] Paidy, Pavan. “Testing Modern APIs Using OWASP API Top 10”. Essex Journal of AI Ethics and Responsible Innovation, vol. 1, Nov. 2021, pp. 313-37

[49] Lin, Geng, and Lori A. MacVittie. Enterprise Architecture for Digital Business. " O'Reilly Media, Inc.", 2022.

[50] Sahil Bucha, “Integrating Cloud-Based E-Commerce Logistics Platforms While Ensuring Data Privacy: A Technical Review,” Journal Of Critical Reviews, Vol 09, Issue 05 2022, Pages1256-1263.

Published

2024-10-30

Issue

Section

Articles

How to Cite

1.
Allam H. Cloud-Native Reliability: Applying SRE to Serverless and Event-Driven Architectures. IJAIDSML [Internet]. 2024 Oct. 30 [cited 2025 Oct. 11];5(3):68-79. Available from: https://ijaidsml.org/index.php/ijaidsml/article/view/185