Designing a Secure ETL Architecture for Integrating Multi-Source Healthcare Data
DOI:
https://doi.org/10.63282/3050-9262.IJAIDSML-V4I1P111Keywords:
Compression A Safe Way To Transfer Data, Health Care Data Synthesis, Data Security, HIPAA, General Document Protection Law, Blockchain, Data Pipeline ProtectionAbstract
The phenomenal growth of healthcare information generated by electronic health records (EHRs), wearable Internet of Medical Things (IoMT) solutions, imaging solutions and laboratory information management systems has posed a significant integration challenge to modern health-care business. Traditional extract-transform-load (ETL) models, which were initially designed with business intelligence in mind, pay little attention to the high confidentiality, integrity and availability standards of sensitive health information required by the regulations, like the Health Insurance Portability and Accountability Act (HIPAA) and the General Data Protection Regulation (GDPR). This paper is a reaction to these constraints by introducing a new architecture, the Secure ETL Architecture (SETA), which incorporates security, privacy and compliance measures directly as part of the data integration process. The infrastructure suggested will use AES-256 encryption, provenance tracking based on blockchain, differential privacy, and role-based access control (RBACs) to create a secure, audit-able, and scalable environment to flow healthcare data. Apache NiFi and Airflow were used to implement SETA in a hybrid cloud on premise system. A 28 9 percent throughput improvement, a 22 3 percent performance reduction, and a 40 0 percent compliance auditability were found when performance assessing synthetic datasets that resembled multi-source hospital systems compared to a traditional ETL process. These results validate the idea that it is possible to improve the performance of ETL architectures once cybersecurity principles are integrated without compromising the strict data protection principles. The architecture suggested in the current paper provides a plan of secure and regulation-compliant healthcare data integration and provides a backdrop to future research on federated and decentralized ETL systems
References
[1] H. Chen et al., “Data security in healthcare cloud systems,” IEEE Access, vol. 9, pp. 121230–121245, 2021.
[2] M. A. Khan and K. Salah, “IoMT-based secure healthcare data integration,” Sensors, vol. 21, no. 10, 2021.
[3] R. J. Figueiredo et al., “Challenges in integrating healthcare data,” J. Med. Syst., vol. 45, no. 12, 2021.
[4] European Parliament, “General Data Protection Regulation (GDPR),” 2018.
[5] IBM Security, “Cost of a Data Breach Report,” 2022.
[6] D. Lin and Z. Wen, “Secure big data integration framework,” Future Generation Computer Systems, vol. 125, pp. 457-471, 2021.
[7] T. Zhang, “FHIR-based healthcare interoperability,” Health Informatics J., vol. 28, 2022.
[8] A. Patel, “Blockchain in healthcare data management,” IEEE Trans. Eng. Manag., vol. 69, no. 6, 2022.
[9] K. Alsubaei, “Security in IoMT systems,” IEEE Access, vol. 8, pp. 123400–123420, 2020.
[10] N. Agarwal, “Differential privacy in medical data sharing,” Comput. Biol. Med., vol. 143, 2022.
[11] L. Xiong and J. Chen, “Privacy-preserving data integration using noise mechanisms,” Information Sciences, vol. 600, 2022.
[12] M. S. Ali, “Blockchain-based audit trails for healthcare,” IEEE Access, vol. 10, pp. 54621–54635, 2022.
[13] S. Nakamura, “Applying Hyperledger in healthcare,” Front. Blockchain, vol. 5, 2023.
[14] U.S. Department of Health & Human Services, “HIPAA Security Rule,” 2021.
[15] A. Johnson et al., “MIMIC-III, a freely accessible critical care database,” Sci. Data, vol. 3, 2016.
[16] P. R. Kumar, “Optimizing secure ETL performance,” Future Internet, vol. 13, 2021.
[17] J. He et al., “Cloud-based healthcare data pipelines,” IEEE J. Biomed. Health Inform., vol. 27, no. 2, 2023.
[18] S. Ryu, “Federated learning in healthcare data systems,” IEEE Access, vol. 9, pp. 182728–182742, 2021.
[19] A. Yassine, “Privacy-aware data management,” IEEE Internet Comput., vol. 25, no. 3, 2021.
[20] W. Zhang, “Secure integration of EHR and IoMT data,” Sensors, vol. 22, no. 18, 2022.
[21] J. Lee, “Blockchain-assisted medical data sharing,” IEEE Access, vol. 10, 2022.
[22] Y. Pan, “Performance analysis of encrypted ETL systems,” J. Cloud Comput., vol. 11, 2023.
[23] K. B. Nguyen, “Healthcare ETL with privacy compliance,” Appl. Sci., vol. 13, 2023.
[24] T. Singh, “AI-driven ETL validation,” IEEE Access, vol. 12, 2024.
[25] M. Rodrigues, “Hybrid cloud security for healthcare,” Sensors, vol. 23, no. 4, 2023.
[26] A. Dastjerdi, “Security analytics in ETL pipelines,” Comput. Secur., vol. 133, 2023.
[27] N. Rahman, “Data lineage verification in medical systems,” IEEE Trans. Inf. Forensics Secur., vol. 18, 2023.
[28] H. Wang, “GDPR-compliant data management,” Future Internet, vol. 16, no. 2, 2024.
[29] L. Patel, “Multi-source integration challenges in health informatics,” Health Inf. Sci. Syst., vol. 12, 2024.
[30] Y. Zhao, “End-to-end privacy in healthcare analytics,” IEEE Access, vol. 13, 2025.










