Designing a Scalable Data Lake Architecture on AWS Using Glue and S3

Authors

  • Karunakar Grandhe Data Engineering & Analytics, Product Manager, New Jersey, USA. Author

DOI:

https://doi.org/10.63282/3050-9262.IJAIDSML-V6I3P110

Keywords:

Data Lake, AWS Glue, Amazon S3, Cloud Architecture, ETL, Scalability, Big Data

Abstract

Data-intensive enterprises require an efficient, low-cost, scalable architecture to manage both structured and unstructured data, given that they are data-driven business enterprises. There are a variety of Cloud-based services that have changed how companies can manage Big Data. This article highlights the practice of scaling a proposed Data Lake Architecture built on Amazon Web Services (AWS), leveraging Amazon Simple Storage Service (S3) as the primary storage service, and AWS Glue, which includes templates to facilitate data integration and transformation. This study focused on system architecture, structure, and performance scalability of operations

References

[1] E. Zagan and M. Danubianu, “Data Lake Architecture for Storing and Transforming Web Server Access Log Files,” IEEE Access, vol. 11, pp. 40916–40929, 2023, doi: https://doi.org/10.1109/access.2023.3270368.

[2] D. Jain, “Lakehouse: A Unified Data Architecture,” International Journal for Research in Applied Science and Engineering Technology, vol. 9, no. 3, pp. 881–887, Mar. 2021, doi: https://doi.org/10.22214/ijraset.2021.33376.

[3] P. Wieder and H. Nolte, “Toward data lakes as central building blocks for data management and analysis,” Frontiers in Big Data, vol. 5, Aug. 2022, doi: https://doi.org/10.3389/fdata.2022.945720.

[4] Zahra Shojaee Rad and Mostafa Ghobaei-Arani, “Data pipeline approaches in serverless computing: a taxonomy, review, and research trends,” Journal of big data, vol. 11, no. 1, Jun. 2024, doi: https://doi.org/10.1186/s40537-024-00939-0.

[5] M. Saxena et al., “The Story of AWS Glue,” Proceedings of the VLDB Endowment, vol. 16, no. 12, pp. 3557–3569, Aug. 2023, doi: https://doi.org/10.14778/3611540.3611547.

[6] A. Nambiar and D. Mundra, “An Overview of Data Warehouse and Data Lake in Modern Enterprise Data Management,” Big Data and Cognitive Computing, vol. 6, no. 4, p. 132, Nov. 2022, Available: https://www.mdpi.com/2504-2289/6/4/132

[7] S. Genovese, “Data Mesh: the newest paradigm shift for a distributed architecture in the data world and its application - Webthesis,” Polito.it, Oct. 2021, doi: https://webthesis.biblio.polito.it/secure/20415/1/tesi.pdf.

[8] S. Worlikar, “Real-Time Patient Monitoring and Alerting in Hospitals Using AWS Lake House Architecture,” Frontiers in Emerging Computer Science and Information Technology, vol. 02, no. 08, pp. 07-14, Aug. 2025, doi: https://doi.org/10.37547/fecsit/volume02issue08-02

[9] J. E. Ike, J. D. Kessie, H. E. Okaro, E. Ezeife, and T. Onibokun, “Identity and Access Management in Cloud Storage: A Comprehensive Guide,” International Journal of Multidisciplinary Research and Growth Evaluation., vol. 6, no. 2, pp. 245–252, 2025, doi: https://doi.org/10.54660/.ijmrge.2025.6.2.245-252.

[10] P. Badri, A. K. R. Goli, and S. R. Goli, “Strengthening Data Governance and Privacy: Utilizing Amazon AWS Cloud Solutions for Optimal Results,” SSRN Electronic Journal, 2025, doi: https://doi.org/10.2139/ssrn.5320361.

Published

2025-09-19

Issue

Section

Articles

How to Cite

1.
Grandhe K. Designing a Scalable Data Lake Architecture on AWS Using Glue and S3. IJAIDSML [Internet]. 2025 Sep. 19 [cited 2025 Sep. 27];6(3):60-3. Available from: https://ijaidsml.org/index.php/ijaidsml/article/view/256