Generative AI–Enabled Intelligent Query Optimization for Large-Scale Data Analytics Platforms

Authors

  • Dinesh Babu Govindarajulunaidu Sambath Narayanan Independent Researcher, USA. Author

DOI:

https://doi.org/10.63282/3050-9262.IJAIDSML-V6I2P117

Keywords:

Generative AI, Query Optimization, Big Data Analytics, Machine Learning, Cost-Based Optimization, Reinforcement Learning, Distributed Databases, Data Lakes

Abstract

The immense scale of data volume in the distributed cloud infrastructures has magnified the computational intensity of query processing in extensive analytics system like Data Lake, distributed SQL systems and real-time data warehouses. The rule-based and cost-based algorithm-based traditional query optimization methods are becoming inadequate with dynamics, heterogeneous sources of data, and unpredictable execution conditions. The study presents an Intelligent Query Optimization Framework that relies on an AI to automatically rewrite the queries, optimize the plans, and execute them dynamically using the Generative Artificial Intelligence (Gen-AI). The proposed system term employs reinforcement learning (RL), natural language processing (NLP), and deep neural architecture to produce optimized query schemes, approximate the best implementation plans, decrease latency and enhance the throughput in gigacursing assemblies. Workloads, execution history of queries and run-time performance metrics on the distributed systems like Apache Spark, Presto, and Snowflake are all used to train the system. The research contributions consist of: (i) a synthesis mechanism to generate query plans through the use of transformer-based models, (ii) modeling to predict the costs of the workload, (iii) a multi-objective optimization scheme that minimizes the execution time, resource consumption, and cost of data transfer, and (iv) a hybrid architecture in which a batch and a streaming analytics is executed. Extensive experimental findings using the TPC-H benchmark show how it improves performance by 41%-percentage change in query latency, 32-percentage changes in throughput, and 27-percentage changes in memory usage as compared to state-of-the-art optimizers. The presented framework can be scaled, flexible and proficient in the changing data environment, which is a remarkable breakthrough in intelligent data analytics

References

[1] Chaudhuri, S. (1998, May). An overview of query optimization in relational systems. In Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems (pp. 34-43).

[2] Lan, H., Bao, Z., & Peng, Y. (2021). A survey on advancing the dbms query optimizer: Cardinality estimation, cost model, and plan enumeration. Data Science and Engineering, 6(1), 86-101.

[3] Van Aken, D., Pavlo, A., Gordon, G. J., & Zhang, B. (2017, May). Automatic database management system tuning through large-scale machine learning. In Proceedings of the 2017 ACM international conference on management of data (pp. 1009-1024).

[4] Zhang, B., Van Aken, D., Wang, J., Dai, T., Jiang, S., Lao, J., ... & Gordon, G. J. (2018). A demonstration of the ottertune automatic database management system tuning service. Proceedings of the VLDB Endowment, 11(12), 1910-1913.

[5] Sambath Narayanan, D. B. G. (2024). Data Engineering for Responsible AI: Architecting Ethical and Transparent Analytical Pipelines. International Journal of Emerging Research in Engineering and Technology, 5(3), 97-105. https://doi.org/10.63282/3050-922X.IJERET-V5I3P110

[6] Zhu, R., Chen, W., Ding, B., Chen, X., Pfadler, A., Wu, Z., & Zhou, J. (2023). Lero: A learning-to-rank query optimizer. arXiv preprint arXiv:2302.06873.

[7] Mohammadjafari, A., Maida, A. S., & Gottumukkala, R. (2024). From natural language to sql: Review of llm-based text-to-sql systems. arXiv preprint arXiv:2410.01066.

[8] Zetterman, N. (2024). Exploring Text-to-SQL with Large Language Models: A Comparative Study of Claude Opus and a fine-tuned smaller-sized LLM.

[9] Jindal, A., Qiao, S., Madhula, S., Raheja, K., & Jain, S. (2024, January). Turning Databases Into Generative AI Machines. In CIDR.

[10] Trummer, I. (2021). Database tuning using natural language processing. ACM SIGMOD Record, 50(3), 27-28.

[11] Gunasekaran, K. P., Tiwari, K., & Acharya, R. (2023). Deep learning based auto tuning for database management system. arXiv preprint arXiv:2304.12747.

[12] Strausz, A., Pardon, N., & Giurgiu, I. (2025). A Learned Cost Model-based Cross-engine Optimizer for SQL Workloads. arXiv preprint arXiv:2506.02802.

[13] Tedeschi, M., Rizwan, S., Shringi, C., Chandgir, V. D., & Belich, S. (2025). An advanced AI driven database system. arXiv preprint arXiv:2507.17778.

[14] Karanasos, K., Balmin, A., Kutsch, M., Ozcan, F., Ercegovac, V., Xia, C., & Jackson, J. (2014, June). Dynamically optimizing queries over large scale data platforms. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data (pp. 943-954).

[15] Chang, B. R., Tsai, H. F., Tsai, Y. C., Kuo, C. F., & Chen, C. C. (2016). Integration and optimization of multiple big data processing platforms. Engineering Computations, 33(6), 1680-1704.

[16] Kaoudi, Z., Quiané-Ruiz, J. A., Thirumuruganathan, S., Chawla, S., & Agrawal, D. (2017, May). A cost-based optimizer for gradient descent optimization. In Proceedings of the 2017 ACM International Conference on Management of Data (pp. 977-992).

[17] Tucudean, G., Bucos, M., Dragulescu, B., & Caleanu, C. D. (2024). Natural language processing with transformers: a review. PeerJ Computer Science, 10, e2222.

[18] Wang, C., Cheung, A., & Bodik, R. (2017, June). Synthesizing highly expressive SQL queries from input-output examples. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (pp. 452-466).

[19] Lee, D., He, N., Kamalaruban, P., & Cevher, V. (2020). Optimization for reinforcement learning: From a single agent to cooperative agents. IEEE Signal Processing Magazine, 37(3), 123-135.

[20] Kulkarni, P. (2012). Reinforcement and systemic machine learning for decision making. John Wiley & Sons.

[21] Marcus, R., Negi, P., Mao, H., Tatbul, N., Alizadeh, M., & Kraska, T. (2021, June). Bao: Making learned query optimization practical. In Proceedings of the 2021 International Conference on Management of Data (pp. 1275-1288).

[22] Sambath Narayanan, D. B. G. (2025). AI-Driven Data Engineering Workflows for Dynamic ETL Optimization in Cloud-Native Data Analytics Ecosystems. American International Journal of Computer Science and Technology, 7(3), 99-109. https://doi.org/10.63282/3117-5481/AIJCST-V7I3P108

Published

2025-05-20

Issue

Section

Articles

How to Cite

1.
Govindarajulunaidu Sambath Narayanan DB. Generative AI–Enabled Intelligent Query Optimization for Large-Scale Data Analytics Platforms. IJAIDSML [Internet]. 2025 May 20 [cited 2026 Jan. 13];6(2):153-60. Available from: https://ijaidsml.org/index.php/ijaidsml/article/view/332