Federated AI-Driven Query Optimization for Distributed Cloud Databases
DOI:
https://doi.org/10.63282/3050-9262.IJAIDSML-V7I1P118Keywords:
Federated Learning, Query Optimization, Distributed Databases, Cloud Computing, AI-Driven Systems, Data PrivacyAbstract
The rapid adoption of distributed cloud databases across multi-cloud and hybrid environments has exposed significant inefficiencies in traditional query optimization techniques, primarily due to data heterogeneity, dynamic workloads, and strict data privacy constraints that limit centralized analysis. Conventional cost-based and machine-learning–driven optimizers struggle to scale effectively in such environments, as they rely on static statistics or require access to globally aggregated query execution data. To address these challenges, this paper proposes a federated AI-driven query optimization framework that enables intelligent and privacy-preserving optimization across distributed cloud databases without sharing raw data. The proposed approach employs federated learning to collaboratively train local cost estimation models using query workload characteristics and execution feedback at each database node, while a global model is iteratively refined through secure aggregation of model updates. An AI-based cost modeling mechanism is integrated with adaptive query plan selection to dynamically optimize execution strategies under varying workload and resource conditions. Extensive experimental evaluations conducted on distributed cloud testbeds using benchmark workloads demonstrate that the proposed framework achieves significant reductions in query latency, improved resource utilization, and enhanced scalability compared to centralized and traditional optimization approaches. The results confirm that federated AI-driven query optimization offers a practical and effective solution for next-generation distributed cloud database systems, balancing performance optimization with data privacy and system autonomy.
References
[1] Forresi, C., Francia, M., Gallinucci, E., & Golfarelli, M. (2023). Cost-based optimization of multistore query plans. Information systems frontiers, 25(5), 1925-1951.
[2] Chaudhuri, S. (1998, May). An overview of query optimization in relational systems. In Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems (pp. 34-43).
[3] Dritsas, E., & Trigka, M. (2025). A Survey on Database Systems in the Big Data Era: Architectures, Performance, and Open Challenges. IEEE Access.
[4] Graefe, G. (1993). Query evaluation techniques for large databases. ACM Computing Surveys (CSUR), 25(2), 73-169.
[5] Neumann, T. (2011). Efficiently compiling efficient query plans for modern hardware. Proceedings of the VLDB Endowment, 4(9), 539-550.
[6] Ortiz, J., Balazinska, M., Gehrke, J., & Keerthi, S. S. (2018, June). Learning state representations for query optimization with deep reinforcement learning. In Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning (pp. 1-4).
[7] Mikhaylov, A., Mazyavkina, N. S., Salnikov, M., Trofimov, I., Qiang, F., & Burnaev, E. (2022). Learned query optimizers: Evaluation and improvement. IEEE Access, 10, 75205-75218.
[8] Kraska, T., Beutel, A., Chi, E. H., Dean, J., & Polyzotis, N. (2018, May). The case for learned index structures. In Proceedings of the 2018 international conference on management of data (pp. 489-504).
[9] Pavlo, A., Angulo, G., Arulraj, J., Lin, H., Lin, J., Ma, L., ... & Zhang, T. (2017, January). Self-Driving Database Management Systems. In CIDR (Vol. 4, p. 1).
[10] Mao, H., Schwarzkopf, M., Venkatakrishnan, S. B., Meng, Z., & Alizadeh, M. (2019). Learning scheduling algorithms for data processing clusters. In Proceedings of the ACM special interest group on data communication (pp. 270-288).
[11] Federated Learning: A Paradigm Shift in Data Privacy and Model Training, Medium, 2024. Online. https://medium.com/@cloudhacks_/federated-learning-a-paradigm-shift-in-data-privacy-and-model-training-a41519c5fd7e
[12] Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., ... & Ng, A. (2012). Large scale distributed deep networks. Advances in neural information processing systems, 25.
[13] Konečný, J., McMahan, B., & Ramage, D. (2015). Federated optimization: Distributed optimization beyond the datacenter. arXiv preprint arXiv:1511.03575.
[14] McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017, April). Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics (pp. 1273-1282). PMLR.
[15] Data security in AI systems: Types of threats, principles and techniques to mitigate them and best practices, leewayhertz. Online. https://www.leewayhertz.com/data-security-in-ai-systems/
[16] Yang, Q., Liu, Y., Chen, T., & Tong, Y. (2019). Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 10(2), 1-19.
[17] Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H. B., Patel, S., ... & Seth, K. (2017, October). Practical secure aggregation for privacy-preserving machine learning. In proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (pp. 1175-1191).
[18] Dimakis, A. G., Kar, S., Moura, J. M., Rabbat, M. G., & Scaglione, A. (2010). Gossip algorithms for distributed signal processing. Proceedings of the IEEE, 98(11), 1847-1864.
[19] Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., ... & Stoica, I. (2016). Apache spark: a unified engine for big data processing. Communications of the ACM, 59(11), 56-65.
[20] Melnik, S., Gubarev, A., Long, J. J., Romer, G., Shivakumar, S., Tolton, M., & Vassilakis, T. (2010). Dremel: interactive analysis of web-scale datasets. Proceedings of the VLDB Endowment, 3(1-2), 330-339.
[21] Dantuluri, V. N. R. (2025). AI-Powered Query Optimization in Multitenant Database Systems. Journal of Computer Science and Technology Studies, 7(4), 802-813.
[22] Jayaram, Y., & Bhat, J. (2025). Autonomous AI Agents for Campus Knowledge Hubs: A Secure and Intelligent System Architecture. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 6(4), 150-161. https://doi.org/10.63282/3050-9262.IJAIDSML-V6I4P120
[23] Bhat, J., & Sundar, D. (2025). Leveraging Generative AI in ERP Systems: Use Cases for Higher Education and Public Sector Operations. American International Journal of Computer Science and Technology, 7(6), 57-69. https://doi.org/10.63282/3117-5481/AIJCST-V7I6P106
[24] Sundar, D., & Jayaram, Y. (2025). AI-Powered Credential Intelligence and Degree Discovery Frameworks for Academic Pathway Analysis. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 6(2), 161-171. https://doi.org/10.63282/3050-9262.IJAIDSML-V6I2P118
[25] Jayaram, Y., & Sundar, D. (2025). Multi-Cloud ECM/WCM Orchestration with AI: A Scalable and Intelligent Enterprise Architecture. American International Journal of Computer Science and Technology, 7(1), 96-110. https://doi.org/10.63282/3117-5481/AIJCST-V7I1P108
[26] Sundar, D. (2025). Reinforcement Learning Techniques for Autonomous Cloud Optimization and Adaptive Resource Management. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 6(3), 134-145. https://doi.org/10.63282/3050-9262.IJAIDSML-V6I3P119
[27] Bhat, J., & Jayaram, Y. (2025). AI-Enhanced Integrations: Secure API Management for Multi-Cloud ERP Environments. International Journal of Emerging Trends in Computer Science and Information Technology, 6(3), 94-103. https://doi.org/10.63282/3050-9246.IJETCSIT-V6I3P115
[28] Jayaram, Y. (2025). AI-Powered ECM Automation with Agentic AI for Adaptive, Policy-Driven Content Processing Pipelines. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 6(3), 125-134. https://doi.org/10.63282/3050-9262.IJAIDSML-V6I3P118
[29] Bhat, J. (2025). Augmenting the Public Sector Workforce with AI Assistants and Intelligent Automation. International Journal of Artificial Intelligence, Data Science, and Machine Learning, 6(4), 162-171. https://doi.org/10.63282/3050-9262.IJAIDSML-V6I4P121
[30] Sundar, D., & Bhat, J. (2025). Lakehouse-Integrated Graph Risk Scoring Architectures for Advanced Fraud Detection. American International Journal of Computer Science and Technology, 7(6), 70-80. https://doi.org/10.63282/3117-5481/AIJCST-V7I6P107










