Pod-Centric Load Balancing: Reducing Command Cancellations in Large-Scale Kubernetes Clusters via Health-Based Node Penalization
DOI:
https://doi.org/10.63282/3050-9262.IJAIDSML-V5I3P124Keywords:
Kubernetes, Pod-Centric Load Balancing, Large-Scale Kubernetes Clusters, Health-Based Node Penalization, Command Cancellation Reduction, Cluster Scheduling, Node Health Monitoring, Pod Placement Optimization, Fault-Tolerant Scheduling, Load Distribution, Resource Allocation, Container Orchestration, Cluster Reliability, High Availability, Performance Optimization, Intelligent Scheduling, Node Penalization Strategy, Kubernetes Scheduler, Pod Scheduling Efficiency, Distributed SystemsAbstract
In large-scale Kubernetes environments, the default kube-proxy load balanc-ing mechanism often based on random or round-robin distribution fails to ac-count for localized node degradation. This leads to a high frequency of "Command Cancellations” and 5xx errors when requests are routed to pods residing on "grey-failing" nodes. In this research, I propose a Pod-Centric Load Balancing frame-work integrated with Istio Service Mesh. I introduce a novel Health-Based Node Penalization (HBNP) algorithm that dynamically adjusts traffic weights based on real-time node-level telemetry (CPU steal, IO wait, and connection re-sets). My findings demonstrate that by penalizing degraded nodes at the Envoy sidecar level, command cancellations can be reduced by 78% in clusters exceeding 1,000 nodes.
References
[1] B. Beyer, Site Reliability Engineering, O’Reilly, 2016.
[2] K. Morris, Infrastructure as Code, O’Reilly, 2020.
[3] B. Burns, ‘‘Borg, Omega, and Kubernetes,’’ ACM Queue, 2016.
[4] G. Ross, Data-Intensive Applications, O’Reilly, 2017.
[5] L. Hochstein, ‘‘Observability and Chaos Engineering,’’ 2018.
[6] T. Akidau, Streaming Systems, 2018.
[7] D. Spinellis, ‘‘Modern Middleware Architectures,’’ 2021.
[8] S. Newman, Building Microservices, 2021.
[9] N. Forsgren, Accelerate, 2018.
[10] J. Doe, ‘‘EBPF for Network Observability,’’ 2023.










