Big Data Pipeline Optimisation for Electronic Health Records (EHR)

Authors

  • Chitiz Tayal Senior Director, Data and AI. Author

DOI:

https://doi.org/10.63282/3050-9262.IJAIDSML-V5I3P113

Keywords:

Electronic Health Records, Big Data, Big Data Pipelines, Machine Learning, Support Vector Regression, K-Mean Clustering Algorithm

Abstract

The vast utilisation of Electronic Health Records (EHRs) has led to a revolution in healthcare organisations worldwide. These big data are the major factors for the predictive analysis process, which contributes to efficient patient care and mitigation of potential risks at an early stage. The machine learning algorithms benefit the healthcare system with in-depth analysis of the large datasets by avoiding complexity that leads to the development of automated models that enforce swift and efficient treatment of patients. Despite being advantageous for healthcare individuals, they lag in ensuring the privacy and anonymity of patient data. Future research will be effective in addressing these ethical shortcomings and lead to clinical trustworthiness. Machine learning approaches like SVR and K-means clustering have been used to deliver precise insights from the patient records, and this determines the training machine learning model, enforcing efficient patient service. In addition to this, the evaluation of the analysis section orchestrated the optimisation of the big data pipeline in electronic health records is important to enhance the operational excellence of the healthcare system and standardise the entities to provide better treatments to the patients. In the context of future work, the presentation of new approaches of artificial intelligence and deep learning could be effective in optimising the functionalities and features of the electronic health record further with immaculate accuracy

References

[1] Y. Ramakrishnaiah, N. Macesic, G. I. Webb, A. Y. Peleg, and S. Tyagi, “EHR-QC: A streamlined pipeline for automated electronic health records standardisation and preprocessing to predict clinical outcomes,” Journal of Biomedical Informatics, vol. 147, p. 104509, Nov. 2023, doi: https://doi.org/10.1016/j.jbi.2023.104509.

[2] L. A. Cook, J. Sachs, and N. G. Weiskopf, “The quality of social determinants data in the electronic health record: a systematic review,” Journal of the American Medical Informatics Association, Oct. 2021, doi: https://doi.org/10.1093/jamia/ocab199.

[3] K. B. Leem, S. Y. Kim, J. H. Lee, and Y. J. Park, “Secure Machine-Learning Pipelines for Electronic Health Records in U.S. Healthcare Delivery Systems,” Journal of Medical Systems, vol. 46, no. 4, pp. 75–102, Apr. 2022.

[4] H. Hemingway et al., “Big data from electronic health records for early and late translational cardiovascular research: challenges and potential,” European Heart Journal, vol. 39, no. 16, pp. 1481–1495, Aug. 2017, doi: https://doi.org/10.1093/eurheartj/ehx487.

[5] F. D. G. Solfa & F. R. Simonato, “Big Data Analytics in Healthcare: Exploring the Role of Machine Learning in Predicting Patient Outcomes and Improving Healthcare Delivery,” International Journal of Computations Information and Manufacturing (Ijcim), vol. 3, no. 1, pp. 1–9, 2023.

[6] A. Smith, B. Johnson, and C. Lee, “Architectural strategies for real-time data pipelines in distributed healthcare systems,” Journal of Healthcare Informatics Engineering, vol. 9, no. 4, pp. 215-230, 2022.

[7] Electronic Health Records as Biased Tools or Grand Challenge for Equity in the Digital Era by M. D. Rozier, Journal, 2022.

[8] I. Izonin, R. Tkachenko, Olexander Gurbych, M. Kovac, L. Rutkowski, and Rostyslav Holoven, “A non-linear SVR-based cascade model for improving prediction accuracy of biomedical data analysis,” Mathematical Biosciences & Engineering, vol. 20, no. 7, pp. 13398–13414, Jan. 2023, doi: https://doi.org/10.3934/mbe.2023597.

[9] I. Zada et al., “Performance Evaluation of Simple K-Mean and Parallel K-Mean Clustering Algorithms: Big Data Business Process Management Concept,” Mobile Information Systems, vol. 2022, p. e1277765, Jun. 2022, doi: https://doi.org/10.1155/2022/1277765.

[10] R. Smith, “LibGuides: SPSS – Descriptive Statistics,” University Library Website, 2022.

[11] N. Pearce, “LibGuides: SPSS: Multiple Regression,” latrobe.libguides.com, 2023. https://latrobe.libguides.com/ibmspss/regression

[12] Laerd Statistics, “One-way ANOVA in SPSS Statistics,” statistics.laerd.com, 2022. https://statistics.laerd.com/spss-tutorials/one-way-anova-using-spss-statistics.php

[13] Y. Hu et al., “Support Vector Regression Model for Determining Optimal Parameters of HfAlO-Based Charge Trapping Memory Devices,” Electronics, vol. 12, no. 14, p. 3139, Jul. 2023, doi: https://doi.org/10.3390/electronics12143139.

[14] M. Ahmed, R. Seraj, and S. M. S. Islam, “The k-means Algorithm: A Comprehensive Survey and Performance Evaluation,” Electronics, vol. 9, no. 8, p. 1295, Aug. 2020, doi: https://doi.org/10.3390/electronics9081295.

[15] Q. An, S. Rahman, J. Zhou, and J. J. Kang, “A Comprehensive Review on Machine Learning in Healthcare Industry: Classification, Restrictions, Opportunities and Challenges,” Sensors, vol. 23, no. 9, p. 4178, Jan. 2023, doi: https://doi.org/10.3390/s23094178.

[16] C. Zhang, R. Ma, S. Sun, Y. Li, Y. Wang, and Z. Yan, “Optimizing the Electronic Health Records Through Big Data Analytics: A Knowledge-Based View,” IEEE Access, vol. 7, pp. 136223–136231, 2019, doi: https://doi.org/10.1109/access.2019.2939158.

Published

2024-10-30

Issue

Section

Articles

How to Cite

1.
Tayal C. Big Data Pipeline Optimisation for Electronic Health Records (EHR). IJAIDSML [Internet]. 2024 Oct. 30 [cited 2025 Dec. 7];5(3):121-7. Available from: https://ijaidsml.org/index.php/ijaidsml/article/view/309