ANCHOR-GEN: Unsupervised Multimodal Generation via Latent Cross-Domain Anchors
DOI:
https://doi.org/10.63282/3050-9262.IJAIDSML-V2I1P109Keywords:
Unsupervised generation, Multimodal Learning, Federated Learning, trust metrics, Accountability, robust aggregation, Explainable AI, controllabilityAbstract
Unsupervised multimodal generation aims to syn- thesize coherent content across domains and modalities (e.g., image and text, audio and video) without relying on paired training data. While recent generative models demonstrate strong perceptual quality, practical deployment across organizational silos remains constrained by privacy, governance, and trust: data cannot be centralized, model updates may be unreliable, and the most accurate models are often least explainable. This paper proposes ANCHOR-GEN, a novel unsupervised multimodal generation framework built around latent cross-domain anchors: compact, interpretable latent factors that align semantically across modalities while remaining learnable from unpaired data. ANCHOR-GEN couples (i) anchor discovery, (ii) anchor- consistent multimodal generation, and (iii) anchor-based explana-tions that expose controllable semantic dimensions of generation. To support deployment across silos, we introduce a trust metric- based federated learning framework that enforces integrity and accountability through update provenance, robust aggregation, privacy-preserving protocols, and auditable trust scoring. Finally, we present a practical framework to quantify and optimize the trade-off between explainability and performance using lightweight metrics based on anchor stability, concept alignment, and utility retention. Experiments on standard unpaired cross- domain settings demonstrate that anchorization improves cross- domain consistency and controllability while enabling measurable explainability with minimal performance loss. The proposed fed- erated trust layer reduces sensitivity to low-quality or adversarial updates and provides accountability without centralizing data.
References
[1] M. Gutmann and A. Hyva¨rinen, “Noise-contrastive estimation: A new estimation principle for unnormalized statistical models,” in Proc. AIS- TATS, 2010.
[2] D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” arXiv preprint arXiv:1312.6114, 2013.
[3] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their composi- tionality,” in Proc. NeurIPS, 2013.
[4] I. Goodfellow et al., “Generative adversarial nets,” in Proc. NeurIPS, 2014.
[5] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel, “Infogan: Interpretable representation learning by information maximizing generative adversarial nets,” in Proc. NeurIPS, 2016.
[6] V. Dumoulin et al., “Adversarially learned inference,” arXiv preprint arXiv:1606.00704, 2016.
[7] M. Abadi et al., “Deep learning with differential privacy,” in Proc. ACM CCS, 2016. :contentReference[oaicite:0]index=0
[8] M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should I trust you?: Explaining the predictions of any classifier,” in Proc. ACM SIGKDD, 2016. :contentReference[oaicite:1]index=1
[9] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adver- sarial networks,” in Proc. ICML, 2017.
[10] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. Agu¨era y Arcas, “Communication-efficient learning of deep net- works from decentralized data,” in Proc. AISTATS, 2017. :contentRe- ference[oaicite:2]index=2
[11] K. Bonawitz et al., “Practical secure aggregation for privacy- preserving machine learning,” in Proc. ACM CCS, 2017. :contentRe- ference[oaicite:3]index=3
[12] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proc. ICCV, 2017.
[13] M.-Y. Liu, T. Breuel, and J. Kautz, “Unsupervised image-to-image translation networks,” in Proc. NeurIPS, 2017.
[14] S. M. Lundberg and S.-I. Lee, “A unified approach to inter- preting model predictions,” in Proc. NeurIPS, 2017. :contentRefer- ence[oaicite:4]index=4
[15] P. Blanchard, E. M. El Mhamdi, R. Guerraoui, and J. Stainer, “Machine learning with adversaries: Byzantine tolerant gradient descent,” in Proc.NeurIPS, 2017. :contentReference[oaicite:5]index=5
[16] X. Huang et al., “Multimodal unsupervised image-to-image translation,” in Proc. ECCV, 2018.
[17] E. M. El Mhamdi, R. Guerraoui, and S. Rouault, “The hidden vul- nerability of distributed learning in Byzantium,” in Proc. ICML, 2018. :contentReference[oaicite:6]index=6
[18] P. Kairouz et al., “Advances and open problems in federated learning,” arXiv preprint arXiv:1912.04977, 2019. :contentRefer- ence[oaicite:7]index=7










