The Convergence of Data Virtualization and Federated Learning in Pharmaceutical Real-World Evidence (RWE) Generation: A Survey and Gap Research in Architectures, Tools, and Governance Challenges

Authors

  • Pinaki Bose Independent Researcher, USA. Author

DOI:

https://doi.org/10.63282/3050-9262.IJAIDSML-V7I1P135

Keywords:

Federated Learning, Data Virtualization, Real-World Evidence (Rwe), Health Informatics, Data Architecture, Data Governance, Omop, Data Mesh

Abstract

The generation of pharmaceutical Real-World Evidence (RWE) is a critical imperative for modern healthcare, yet it is hindered by two fundamental bottlenecks: (1) stringent privacy regulations, such as HIPAA, which preclude data centralization, and (2) pervasive, systemic data fragmentation within healthcare institutions.2 Federated Learning (FL) has emerged as the consensus solution for the privacy challenge, enabling model training on distributed data. Concurrently, Data Virtualization (DV) is the industry-standard solution for data fragmentation, providing a unified logical view of data silos. The current scientific literature, however, investigates these two solutions in parallel, failing to address the critical gap at their intersection. FL research implicitly operates on a "Unified Node Assumption," presuming each participating hospital has its data in a single, queryable repository, which is practically false. This paper bridges this gap by conducting a systematic survey of both domains and proposing the first architectural taxonomy of "Federated Learning on a Virtualized Data Layer" (FL-on-VD). Three novel architectural models are proposed and illustrated: (1) FL with Virtualized Query Pushing (FL-VQP), (2) FL on a Logical Data Mesh (FL-LDM), and (3) FL on a Centralized Virtual-View (FL-CVV). This paper analyzes the unique, second-order challenges this convergence creates in federated query optimization, semantic interoperability, and "double-blind" governance. It concludes that this unified FL-on-VD architecture represents the only viable and scalable path toward national-level RWE generation.

References

[1] DelveInsight, "Artificial intelligence in drug commercialization: Accelerating market success through data-driven precision," DelveInsight Blog, 2024. [Online]. Available: https://www.delveinsight.com/blog/artificial-intelligence-in-drug-commercialization

[2] N. D. Heiger, C. R. Thompson, and J. S. Brown, "The current landscape and emerging applications for real‐world data in diagnostics and clinical decision support and its impact on regulatory decision making," Clin. Pharmacol. Ther., vol. 113, no. 1, pp. 31–36, Jan. 2023, doi: 10.1002/cpt.2783.

[3] M. R. Al-Zahrani and M. M. S. Al-Majeed, "Convergence of integrated sensing and communication (ISAC) and digital-twin technologies in healthcare systems: A comprehensive review," Healthcare, vol. 6, no. 4, Art. no. 51, 2024, doi: 10.3390/healthcare6040051.

[4] S. Zhang et al., "Federated causal inference in healthcare: Methods, challenges, and opportunities," arXiv preprint arXiv:2505.02238, 2025.

[5] Duke-Margolis Institute for Health Policy, "Real-world evidence," Duke-Margolis Healthcare Topics, 2024. [Online]. https://healthpolicy.duke.edu/topics/real-world-evidence

[6] P. K. Suri and P. Singh, "A theoretical exploration of data management and integration in organization sectors," Int. J. Data Mining Knowl. Manag. Process, vol. 11, no. 1, pp. 31–45, Jan. 2021, doi: 10.5121/ijdms.2019.11103.

[7] Denodo Technologies, "Healthcare data management: Modernizing healthcare with data virtualization," 2024. [Online]. https://www.denodo.com/en/solutions/by-industry/healthcare.

[8] A. Sharma et al., "An advanced data fabric architecture leveraging generative AI and metadata-driven automation," arXiv preprint arXiv:2402.09795, 2024.

[9] L. Wang et al., "Ontology- and LLM-based data harmonization for federated learning in healthcare," arXiv preprint arXiv:2505.20020, 2025.

[10] TEHDAS, "Report on EHDS architecture and infrastructure implementers expectations and experiences," Joint Action Towards the European Health Data Space, Rep., 2022. [Online]. https://tehdas.eu/app/uploads/2022/06/tehdas-report-on-ehds-architecture-and-infrastructure-implementers-expectations-experiences.pdf

[11] M. A. Alzahrani, "Framework of big data analytics in real time for healthcare enterprise performance measurements," Ph.D. dissertation, Dept. Industrial Eng. & Management Syst., Univ. Central Florida, Orlando, FL, USA, 2021.

[12] R. S. S. S. Prasad, "The best practice of big data architecture in a health care organization," ResearchGate, Oct. 2014. [Online]. https://www.researchgate.net/figure/The-best-practice-of-big-data-architecture-in-a-health-care-organization_fig1_266613537

[13] Orion Innovation, "Data virtualization: Accelerating self-service analytics with a unified semantic layer," Orion Case Studies, 2023. [Online]. https://www.orioninc.com/case-studies/accelerating-self-service-analytics-with-a-unified-semantic-layer/

[14] enodo Technologies, "Data virtualization can deliver ROI of 408% according to new independent research study," Denodo Press Release, Nov. 30, 2021. [Online]. https://www.denodo.com/en/press-release/2021-11-30/data-virtualization-can-deliver-roi-408-according-new-independent-research

[15] Oracle Corporation, "Oracle big data SQL," Oracle Datasheet, 2020.

[16] Online].https://www.oracle.com/docs/tech/database/bigdatasql-datasheet.pdf

[17] Capgemini, "TechnoVision 2024: CTIO report," Capgemini Technical Report, 2024. [Online]. https://www.scribd.com/document/733241240/TechnoVision-2024-CTIO-Report-Web-Version

[18] Gathr.ai, "The silent revolution of invisible AI," Gathr Blog, 2024. [Online]. https://www.gathr.ai/blog/the-silent-revolution-of-invisible-ai/

[19] World Bank, "Harnessing data for better lives," World Development Report 2021, World Bank Group, Washington, DC, USA, 2021. [Online]. https://openknowledge.worldbank.org/handle/10986/35218

[20] M. Muniswamaiah, T. Agerwala, and C. C. Tappert, "Federated query processing for big data in data science," in Proc. IEEE 6th Int. Conf. Big Data Comput. Serv. Appl. (BigDataService), 2020, pp. 115–118, doi: 10.1109/BigDataService49289.2020.00025.

[21] A. B. Researcher et al., "Rethinking pluggable federated query optimization: From laptops to data warehouses," in Proc. VLDB Workshops 2025 (CDMS), 2025. [Online]. https://www.vldb.org/2025/Workshops/VLDB-Workshops-2025/CDMS/CDMS25_07.pdf.

[22] Horizon-Trustee Project, "D2.1 Live doc conceptualisation, use cases and system architecture V1," EU Horizon Europe Deliverable, 2024. [Online]. https://horizon-trustee.eu/wp-content/uploads/2024/06/D2.1-Live-doc-conceptualisation-use-cases-and-system-architecture-V1.pdf.

[23] IBM Cloud, "Denodo connection," IBM Cloud Pak for Data Documentation, 2024. [Online]. https://eu-gb.dataplatform.cloud.ibm.com/docs/content/wsj/manage-data/conn-denodo.html.

[24] Journal Press, "Federated data governance for cross-institution anti-money laundering," London J. Eng. Res., vol. 25, 2024. [Online]. https://journalspress.com/LJER_Volume25/Federated-Data-Governance.pdf.

Published

2026-02-23

Issue

Section

Articles

How to Cite

1.
Bose P. The Convergence of Data Virtualization and Federated Learning in Pharmaceutical Real-World Evidence (RWE) Generation: A Survey and Gap Research in Architectures, Tools, and Governance Challenges. IJAIDSML [Internet]. 2026 Feb. 23 [cited 2026 Feb. 26];7(1):209-16. Available from: https://ijaidsml.org/index.php/ijaidsml/article/view/450