KFP v2 Artifact-Centric ML Pipeline Governance
DOI:
https://doi.org/10.63282/3050-9262.IJAIDSML-V4I2P116Keywords:
Kubeflow Pipelines, ML Governance, Artifact Management, MLOps, Metadata Tracking, Model Provenance, Workflow Automation, ComplianceAbstract
The second version of Kubeflow Pipelines (KFP) represents a significant evolutionary step, which creates a window of opportunity to cleverly solve one of MLOps’ most challenging problems the governance of (ML) artifacts, which in turn drive the lifecycles of (ML) models. The paper under review introduces an artifact-centric governance framework engineered to elevate the traceability, reproducibility, compliance, and auditability of KFP v2-based ML workflows. Typically, pipeline tracking systems tend to focus on execution metadata at the expense of the detailed relationships that link datasets, models, and metrics and thereby define the lineage of a pipeline. Moving governance focus to the level of artifacts, the proposed solution thus allows for exact versioning, dependency mapping, and integrity checking of the pipeline at any component level. The intention of this research is to put forward a reference model for the governance of artifacts stretching from data ingestion to model deployment and embedding metadata management, lineage visualization, and compliance controls in the KFP ecosystem. The framework furthers the establishment of policy-driven artifact tracking, enriched metadata schema, and provenance-aware logging mechanisms, which inter alia ensure that an artifact’s creation, transformation, and consumption are documented in detail. This, in turn, not only facilitates collaboration and audit readiness but also aids the implementation of organizational AI governance and regulatory compliance initiatives. The experimental validation shows that implementing an artifact-centric governance model leads to a decrease in the complexity of governance tasks, while at the same time the reproducibility and the speed of audit response increase.
References
[1] Sridhar, Vinay, et al. "Model governance: Reducing the anarchy of production {ML}." 2018 USENIX Annual Technical Conference (USENIX ATC 18). 2018.
[2] Varma, Yasodhara. "Governance-Driven ML Infrastructure: Ensuring Compliance in AI Model Training." International Journal of Emerging Research in Engineering and Technology 1.1 (2020): 20-30.
[3] Laato, Samuli, et al. "AI governance in the system development life cycle: Insights on responsible machine learning engineering." Proceedings of the 1st International Conference on AI Engineering: Software Engineering for AI. 2022.
[4] Ogunsola, Kolade Olusola, Emmanuel Damilare Balogun, and Adebanji Samuel Ogunmokun. "Developing an automated ETL pipeline model for enhanced data quality and governance in analytics." International Journal of Multidisciplinary Research and Growth Evaluation 3.1 (2022): 791-796.
[5] Happer, Carter. "Evaluating Model Governance and Compliance Strategies in Enterprise AutoML Systems." (2022).
[6] Munappy, Aiswarya Raj, Jan Bosch, and Helena Homström Olsson. "Data pipeline management in practice: Challenges and opportunities." International Conference on Product-Focused Software Process Improvement. Cham: Springer International Publishing, 2020.
[7] Parakala, Adityamallikarjunkumar, and Jyothirmay Swain. "AI‑Powered Intelligent Automation Emerges." International Journal of Artificial Intelligence, Data Science, and Machine Learning 3.4 (2022): 96-106.
[8] Raj, Aiswarya, et al. "Modelling data pipelines." 2020 46th Euromicro conference on software engineering and advanced applications (SEAA). IEEE, 2020.
[9] Guntupalli, Bhavitha. "Asynchronous Programming in Java/Python: A Developer’s Guide." International Journal of Emerging Research in Engineering and Technology 3.2 (2022): 70-78.
[10] Schneider, Johannes, et al. "AI governance for businesses." arXiv preprint arXiv:2011.10672 (2020).
[11] Rella, Bhanu Prakash Reddy. "MLOPs and DataOps integration for scalable machine learning deployment." International Journal for Multidisciplinary Research (Vols. 1–3)[Journal-article]. https://www. researchgate.net/publication/390554912https://www. ijfmr. com/research-paper. php (2022).
[12] Parakala, Adityamallikarjunkumar. "Integrating Salesforce and UiPath: Cross-System Intelligent Automation." International Journal of Emerging Trends in Computer Science and Information Technology 3.4 (2022): 88-99.
[13] Mitchell, Logan R., et al. "Scalable Machine Learning Pipelines for Real-Time Analytics in Distributed Systems." (2020).
[14] Selvarajan, Guru Prasad. "Optimising Machine Learning Workflows in SnowflakeDB: A Comprehensive Framework Scalable Cloud-Based Data Analytics." Technix International Journal for Engineering Research 8.11 (2021).
[15] Stilgoe, Jack. "Machine learning, social learning and the governance of self-driving cars." Social studies of science 48.1 (2018): 25-56.
[16] 16.Agrawal, Ashvin, et al. "Cloudy with high chance of DBMS: A 10-year prediction for Enterprise-Grade ML." arXiv preprint arXiv:1909.00084 (2019).
[17] Guntupalli, Bhavitha. "How I Optimized a Legacy Codebase with Refactoring Techniques." International Journal of Emerging Trends in Computer Science and Information Technology 3.1 (2022): 98-106.
[18] Rahmani, Amir Masoud, et al. "Machine learning (ML) in medicine: review, applications, and challenges." Mathematics 9.22 (2021): 2970.
[19] Zhou, Yue, Yue Yu, and Bo Ding. "Towards mlops: A case study of ml pipeline platform." 2020 International conference on artificial intelligence and computer engineering (ICAICE). IEEE, 2020.










