Privacy Preserving Machine Learning and Data Governance for AI Systems

Authors

  • Rashi Nimesh Kumar Dhenia Independent Researcher, USA. Author
  • Raghavendra Sridhar Independent Researcher, USA. Author
  • Ishva Jitendrakumar Kanani Independent Researcher, USA. Author

DOI:

https://doi.org/10.63282/3050-9262.IJAIDSML-V5I4P121

Keywords:

Preserving Machine Learning (PPML), Cryptographic Techniques, Decentralized Training Paradigms, Data Governance

Abstract

As machine learning permeates sensitive domains such as healthcare, finance, and government, protecting individual privacy while leveraging large-scale data remains a paramount challenge. Privacy-Preserving Machine Learning (PPML) combines cryptographic techniques, decentralized training paradigms, and data governance policies to enable secure and compliant model development. This paper provides a comprehensive survey of fundamental PPML methods differential privacy, federated learning, homomorphic encryption and examines key data governance frameworks underpinning ethical AI adoption. We analyze technical trade-offs, including privacy-utility balance, scalability, and adversarial resilience. Finally, ongoing research directions and policy implications are discussed, emphasizing interdisciplinary collaboration for trustworthy AI deployment.

References

[1] Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.

[2] Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL-HLT.

[3] Dinan, E., Roller, S., Shuster, K., Fan, A., Auli, M., & Weston, J. (2019). Wizard of Wikipedia: knowledge-powered conversational agents. Proceedings of ICLR.

[4] Fan, A., Grangier, D., & Auli, M. (2021). Retrieval-augmented generation for knowledge-intensive NLP tasks. arXiv preprint arXiv:2005.11401.

[5] Guu, K., Lee, K., Tung, Z., Pasupat, P., & Chang, M. (2020). REALM: retrieval-augmented language model pre-training. arXiv preprint arXiv:2002.08909.

[6] Huang, L., Wang, W., Chen, J., & Wei, F. (2020). Hierarchical retrieval-augmented generation for multi-document summarization. Proceedings of EMNLP.

[7] Hu, H., Miller, T., Tian, Y., & Zhang, E. (2019). Multi-hop attention networks for contextualized question answering. arXiv:1909.00423.

[8] Izacard, G., & Grave, E. (2021). Leveraging passage retrieval with generative models for open domain question answering. arXiv preprint arXiv:2007.01282.

[9] Jia, R., Raghunathan, A., & Liang, P. (2020). Adversarial attacks and defenses for question answering. ACL.

[10] Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., & Yih, W. (2020). Dense passage retrieval for open-domain question answering. Proceedings of EMNLP.

[11] Kendra, S., Li, M., & Chang, M. (2021). Scaling dense retrieval by approximate nearest neighbor search. SIGIR.

[12] Lewis, P., Oguz, B., Rinott, R., Riedel, S., & Stoyanov, V. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. NeurIPS.

[13] Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J. (2020). BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics.

[14] Maynez, J., Narayan, S., Bohnet, B., & McDonald, R. (2020). On faithfulness and factuality in abstractive summarization. ACL.

[15] Petroni, F., Rocktäschel, T., Riedel, S., Lewis, P., Bakhtin, A., Wu, Y., & Miller, A. (2019). Language models as knowledge bases? EMNLP.

[16] Shokri, R., & Shmatikov, V. (2015). Privacy-preserving deep learning. ACM CCS.

[17] Thorne, J., Vlachos, A., Christodoulopoulos, C., & Mittal, A. (2018). FEVER: a large-scale dataset for fact extraction and verification. NAACL-HLT.

[18] Raghavendra Sridhar, I. J., & Dhenia, R. N. K. (2021). Dynamic frameworks for enhancing security in digital payment systems. International Journal of Emerging Research in Engineering and Technology, 2(...).

[19] Dhenia, R. N. K. (2020). An analytical study of NoSQL database systems for big data applications. International Journal of Science and Research (IJSR), 9(8), 1616–1619.

[20] Dhenia, I. J. K. Rashi Nimesh Kumar. (2020). Data visualization best practices: enhancing comprehension and decision making with effective visual analytics. International Journal of Science and Research (IJSR), 9(8), 1620–1624.

[21] Dhenia, R. N. K. (2020). Leveraging data analytics to combat pandemics: real-time analytics for public health response. International Journal of Science and Research (IJSR), 9(12), 1945–1947.

[22] Dhenia, R. N. K. (2020). Harnessing big data and NLP for real-time market sentiment analysis across global news and social media. International Journal of Science and Research (IJSR), 9(2), 1974–1977.

[23] Kanani, I. J. K. Rashi Nimesh Kumar, & Sridhar, R. (2021). Intelligent threat detection in cloud environments using data science-driven security analytics. International Journal of Emerging Research in Engineering and Technology, 2(...).

[24] Rashi Nimesh Kumar Dhenia, Ishva Jitendrakumar Kanani, & Sridhar, Raghavendra. (2021). Customer personalization using data science in e-commerce: integrating foundational and emerging research. International Journal of Emerging Research in Engineering and Technology, 2(...).

[25] Kanani, I. J., Sridhar, R., & Dhenia, R. N. K. (2023). Security-centric artificial intelligence: strengthening machine learning systems against emerging threats. International Journal of Artificial Intelligence, Data Science, and Machine Learning.

[26] Dhenia, R. N. K., Kanani, I. J., & Sridhar, R. (2023). Data-centric AI: transforming the future of artificial intelligence and analytics. International Journal of Artificial Intelligence, Data Science, and Machine Learning.

[27] Raghavendra Sridhar, I. J. K., Dhenia, R. N. K., & Kanani, I. J. (2023). A machine learning framework for predictive workload modeling and dynamic cloud resource allocation. International Journal of Artificial Intelligence, Data Science, and Machine Learning.

[28] Kanani, I. J., Raghavendra Sridhar, & Dhenia, R. N. K. (2023). Security-centric artificial intelligence: strengthening machine learning systems against emerging threats. International Journal of Artificial Intelligence and Data Science, .

[29] Dhenia, R. N. K. (2022). Data analytics in construction machinery: applications, challenges and future directions. World Journal of Advanced Research and Reviews, 13(3).

[30] Dhenia, R. N. K. (2022). Text mining and social media analysis for mental health insights. World Journal of Advanced Research and Reviews, 15(3).

[31] Dhenia, R. S. Rashi Nimesh Kumar. (2022). The impact of data bias on decision making. World Journal of Advanced Research and Reviews, 14(3).

[32] Dhenia, R. N. K. (2021). The role of big data analytics in predicting and managing urban traffic flow. International Journal For Multidisciplinary Research, 3(2).

Published

2024-12-30

Issue

Section

Articles

How to Cite

1.
Kumar Dhenia RN, Sridhar R, Kanani IJ. Privacy Preserving Machine Learning and Data Governance for AI Systems. IJAIDSML [Internet]. 2024 Dec. 30 [cited 2026 Mar. 9];5(4):227-30. Available from: https://ijaidsml.org/index.php/ijaidsml/article/view/433