AI-Enhanced Distributed Databases: Optimizing Query Processing and Replication Strategies for High-Throughput Applications

Sathish Srinivasan; Suresh Bysani Venkata Naga; Krishnaiah Narukulla

doi:10.63282/3050-9262.IJAIDSML-V3I2P109

Authors

Sathish Srinivasan Member of Technical Staff | PayPal, Core Platforms & Infrastructure, San Francisco Bay Area, California, USA. Author
Suresh Bysani Venkata Naga Engineering Leader SAAS and Distributed systems Cohesity, San Francisco Bay Area, California, USA. Author
Krishnaiah Narukulla Principal Engineer | Roku and Cohesity, Distributed Systeems, Cloud & Machine Learning Expert, Sanfrancisco Bay Area, California, USA. Author

DOI:

https://doi.org/10.63282/3050-9262.IJAIDSML-V3I2P109

Keywords:

Artificial Intelligence, Distributed Databases, Query Optimization, Replication Strategies, Machine Learning, High-Throughput Applications

Abstract

Distributed databases have been an important foundation point for scalable high-throughput applications for a while now. However, as we approach data deluge and application complexity, the traditional ways of optimization may fail to guarantee performance, scalability, and fault tolerance. This paper discusses an extensive inclusion of Artificial Intelligence (AI) approaches to distributed database systems to improve query processing and replication strategies in order to overcome major performance bottlenecks. The conventional optimization techniques to query are purely based on static cost models and fixed heuristics, which, most of the time, are unable to cope with dynamic workloads. Correspondingly, the static replication strategies are not able to effectively deal with the dynamic access patterns of contemporary applications. The reason for its study is the limited possibilities of the existing approaches and the possibilities of artificial intelligence to change the way distributed databases deal with resources and queries. We propose a new AI-enhanced architecture for distributed databases with an application of Machine Learning (ML) and Deep Learning (DL) that is able to: Forecast queries execution plans on the basis of history, Optimize real-time replication strategies, Dynamically adapt to workload changes, and Improve the fault tolerance and system robustness. Experimental assessments on a simulated high-throughput e-commerce workload show that AI-enhanced systems overperform legacy setups by up to 40% for query latency and up to 30% better replication efficiency. These enhancements are validated with benchmark standards in the industry

References

[1] Chaudhuri, S. (1998, May). An overview of query optimization in relational systems. In Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems (pp. 34-43).

[2] Ioannidis, Y. E. (1996). Query optimization. ACM Computing Surveys (CSUR), 28(1), 121-123.

[3] DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., ... & Vogels, W. (2007). Dynamo: Amazon's highly available key-value store. ACM SIGOPS operating systems review, 41(6), 205-220.

[4] Corbett, J. C., Dean, J., Epstein, M., Fikes, A., Frost, C., Furman, J. J., ... & Woodford, D. (2013). Spanner: Google’s globally distributed database. ACM Transactions on Computer Systems (TOCS), 31(3), 1-22.

[5] Terry, D. B., Theimer, M. M., Petersen, K., Demers, A. J., Spreitzer, M. J., & Hauser, C. H. (1995). Managing update conflicts in Bayou, a weakly connected replicated storage system. ACM SIGOPS Operating Systems Review, 29(5), 172-182.

[6] Saito, Y., & Shapiro, M. (2005). Optimistic replication. ACM Computing Surveys (CSUR), 37(1), 42-81.

[7] Gilbert, S., & Lynch, N. (2002). Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. Acm Sigact News, 33(2), 51-59.

[8] Kraska, T., Beutel, A., Chi, E. H., Dean, J., & Polyzotis, N. (2018, May). The case for learned index structures. In Proceedings of the 2018 international conference on management of data (pp. 489-504).

[9] Marcus, R., Negi, P., Mao, H., Zhang, C., Alizadeh, M., Kraska, T., ... & Tatbul, N. (2019). Neo: A learned query optimizer. arXiv preprint arXiv:1904.03711.

[10] Ortiz, J., Balazinska, M., Gehrke, J., & Keerthi, S. S. (2018, June). Learning state representations for query optimization with deep reinforcement learning. In Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning (pp. 1-4).

[11] Kipf, A., Kipf, T., Radke, B., Leis, V., Boncz, P., & Kemper, A. (2018). Learned cardinalities: Estimating correlated joins with deep learning. arXiv preprint arXiv:1809.00677.

[12] Trummer, I., & Koch, C. (2015). Multiple query optimization on the D-Wave 2X adiabatic quantum computer. arXiv preprint arXiv:1510.06437.

[13] Leis, V., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., & Neumann, T. (2015). How good are query optimizers, really?. Proceedings of the VLDB Endowment, 9(3), 204-215.

[14] Zhao, L. (2021). AI-Enhanced Data Structures for High-Performance Computing. International Journal of AI, Big Data, Computational and Management Studies, 2(2), 1-9.

[15] Wu, S., Li, F., Mehrotra, S., & Ooi, B. C. (2011, October). Query optimization for massively parallel data processing. In Proceedings of the 2nd ACM Symposium on Cloud Computing (pp. 1-13).

[16] Waluyo, A. B., Srinivasan, B., & Taniar, D. (2005). Research in mobile database query optimization and processing. Mobile Information Systems, 1(4), 225-252.

[17] Bruno, N., Jain, S., & Zhou, J. (2013). Continuous cloud-scale query optimization and processing. Proceedings of the VLDB Endowment, 6(11), 961-972.

[18] Liu, Y., Gordon, M., Wang, J., Bishop, M., Chen, Y., Pfeiffer, T., ... & Viganola, D. (2020). Replication markets: Results, lessons, challenges, and opportunities in AI replication. arXiv preprint arXiv:2005.04543.

[19] Chai, Z., & Zhao, C. (2019). Enhanced random forest with concurrent analysis of static and dynamic nodes for industrial fault classification. IEEE Transactions on Industrial Informatics, 16(1), 54-66.

[20] Mou, L., Zhao, P., Xie, H., & Chen, Y. (2019). T-LSTM: A long short-term memory neural network enhanced by temporal information for traffic flow prediction. Ieee Access, 7, 98053-98060.

[21] Animesh Kumar, “Redefining Finance: The Influence of Artificial Intelligence (AI) and Machine Learning (ML)”, Transactions on Engineering and Computing Sciences, 12(4), 59-69. 2024.

[22] Kaisers, M., & Tuyls, K. (2010, May). Frequency adjusted multi-agent Q-learning. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1-Volume 1 (pp. 309-316).

[23] Animesh Kumar, “AI-Driven Innovations in Modern Cloud Computing”, Computer Science and Engineering, 14(6), 129-134, 2024.

AI-Enhanced Distributed Databases: Optimizing Query Processing and Replication Strategies for High-Throughput Applications

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

How to Cite

call for paper

Make a Submission

Cover Image

CURRENT INDEX

TOOLS

Latest publications

Information