Generative Scene Graphs for Explainable Perception in Autonomous Vehicles

Authors

  • Gaurav Pokharkar Independent Researcher, USA. Author

DOI:

https://doi.org/10.63282/3050-9262.IJAIDSML-V6I4P125

Keywords:

ADAS, Autonomous Vehicles, Scene Understanding, Scene Graphs, Generative Models, Explainable Artificial Intelligence (XAI)

Abstract

Perception systems in autonomous and advanced driver assistance vehicles increasingly rely on large, data driven neural architectures that achieve strong accuracy but remain fundamentally opaque. Their internal reasoning is difficult to interpret, verify, or trace, which poses challenges for the safety certification, debugging, and regulatory transparency. Existing attempts at interpretable perception such as symbolic reasoning, attention visualization, or post hoc saliency rarely provide structured, causally meaningful explanations that planners, auditors, or human operators can reliably trust. This paper introduces a generative perception framework that produces a fully interpretable scene graph representation as the primary output rather than as an optional diagnostic layer. The scene graph encodes objects, semantic attributes, relations, interactions, and driving relevant affordances in a structured form compatible with downstream decision making and formal analysis. The proposed approach employs a generative model that operates in graph latent space to enforce global physical and semantic consistency. Instead of passively extracting relations, the model actively predicts missing, uncertain, or occluded components while maintaining adherence to vehicle dynamics constraints, traffic rules, and common sense priors learned from data. This generative mechanism allows the perception system to expose uncertainty at the node, relation, and affordance levels, enabling explicit traceability of potential failure modes. The resulting graph structure serves as both an interpretable explanation of system’s perception and a robust intermediate representation for planning. Experiments conducted on multi sensor autonomous driving datasets demonstrate that generative scene graphs substantially improve explanation quality and relational correctness, especially under occlusions and degraded sensing. At the same time, detection performance remains competitive with state of the art black box methods. By unifying generative modeling, relational reasoning, and structured explainability, this work positions generative scene graphs as a practical step toward transparent, auditable, and regulator aligned perception pipelines in autonomous vehicles

References

[1] G. Pokharkar, “Design and evaluation of ai safety mechanisms in adas and autonomous vehicle architectures,” International Journal of Emerging Trends in Computer Science and Information Technology, pp. 57–67, Sep. 2025. [Online]. Available: https://ijetcsit.org/index.php/ ijetcsit/article/view/388

[2] “Scenario-based validation for sae level 2+ features using simulation-in-the-loop (sil) systems,” International Journal of Innovative Research and Creative Technology, vol. 11, no. 4, Jul. 2025. [Online]. Available: https://doi.org/10.5281/zenodo.16883284

[3] Y. Zhou and O. Tuzel, “Voxelnet: End-to-end learning for point cloud based 3d object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

[4] A. H. Lang, S. Vora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Pointpillars: Fast encoders for object detection from point clouds,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[5] Y. Wang, S. Shi, X. Li et al., “Detr3d: 3d object detection from multi- view images via 3d-to-2d queries,” in Advances in Neural Information Processing Systems (NeurIPS), 2022.

[6] Y. Li, L. Chen, A. Dai et al., “Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers,” in European Conference on Computer Vision (ECCV), 2022.

[7] J. Huang, Y. Tan, W. Chen et al., “Bevdet4d: Exploiting temporal cues for multi-camera 3d object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.

[8] T. Yin, X. Zhou, and P. Krahenbuhl, “Centerpoint: A center-based 3d object detector,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.

[9] Z. Liu, T. Hu, R. Xu et al., “Petr: Position embedding transformation for multi-view 3d object detection,” in European Conference on Computer Vision (ECCV), 2022.

[10] D. Bogdoll, M. Nitsche, and J. M. Zo¨llner, “Anomaly detection in autonomous driving: A survey,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 4488–4499.

[11] B. Ivanovic and M. Pavone, “Injecting planning-awareness into pre- diction and detection evaluation,” in 2022 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2022, pp. 821–828.

[12] R. Krishna, Y. Zhu, O. Groth et al., “Visual genome: Connecting language and vision using crowdsourced dense image annotations,” International Journal of Computer Vision (IJCV), vol. 123, no. 1, pp. 32–73, 2017.

[13] R. Zellers, M. Yatskar, S. Thomson, and Y. Choi, “Neural motifs: Scene graph parsing with global context,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

[14] J. Yang, J. Lu, S. Lee, D. Batra, and D. Parikh, “Graph r-cnn for scene graph generation,” in European Conference on Computer Vision (ECCV), 2019.

[15] H. Zipfl, J. Gruber, C. Sakaridis et al., “Relation-based motion prediction using traffic scene graphs,” Technical Report, 2022.

[16] J. Wang, C. Li, Z. Hou et al., “Rs2g: Data-driven scene-graph extraction and embedding for robust autonomous perception,” in Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), 2024.

[17] L. Greve, P. Mersch, and S. Behnke, “Curb-sg: Collaborative dynamic 3d scene graphs for automated driving,” in IEEE International Conference on Robotics and Automation (ICRA), 2023.

[18] C. Lv, H. Liu, M. Zhao et al., “T2sg: Traffic topology scene graph for topology reasoning in autonomous driving,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025.

[19] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Advances in Neural Information Processing Systems (NeurIPS), 2020.

[20] R. Rombach, A. Blattmann, D. Lorenz et al., “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.

[21] Y. Yuan, C. Jiang, H. Xu et al., “Diffusion-based trajectory prediction for autonomous driving,” in International Conference on Machine Learning (ICML), 2023.

[22] H. Chen, N. Werner, W. Tan et al., “Generative lidar simulation using diffusion models,” arXiv preprint arXiv:2403.01452, 2024.

[23] L. Wang, R. Zhao, V. Shah et al., “Generative ai for autonomous driving: A review,” arXiv preprint arXiv:2505.15863, 2025.

[24] J. Johnson, A. Gupta, and L. Fei-Fei, “Image generation from scene graphs,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1219–1228.

[25] C. Vignac, I. Krawczuk, A. Siraudin, B. Wang, V. Cevher, and P. Frossard, “Digress: Discrete denoising diffusion for graph generation,” arXiv preprint arXiv:2209.14734, 2022.

[26] Z. Su, C. Wang, D. Bradley, C. Vallespi-Gonzalez, C. Wellington, and N. Djuric, “Convolutions for spatial interaction modeling,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 6583–6592.

[27] H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A multimodal dataset for autonomous driving,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 621–11 631.

[28] M.-F. Chang, J. Lambert, P. Sangkloy, J. Singh, S. Bak, A. Hartnett, Wang, P. Carr, S. Lucey, D. Ramanan et al., “Argoverse: 3d tracking and forecasting with rich maps,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 8748– 8757.

[29] P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine et al., “Scalability in perception for autonomous driving: Waymo open dataset,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2446–2454.

[30] M. Zipfl and J. M. Zo¨llner, “Towards traffic scene description: The semantic scene graph,” in 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2022, pp. 3748–3755.

[31] Mukkala, S. R. (2023). A Proficient Hospital Ratings Aware Patient Churn Prediction And Prevention System Using Abg-Fuzzy And Ner-Gfjdkmeans. Educational Administration: Theory and Practice, 29 (03), 1407-1424 Doi: 10.53555/kuey. v29i3, 9511.

Published

2025-12-17

Issue

Section

Articles

How to Cite

1.
Pokharkar G. Generative Scene Graphs for Explainable Perception in Autonomous Vehicles. IJAIDSML [Internet]. 2025 Dec. 17 [cited 2026 Mar. 9];6(4):183-92. Available from: https://ijaidsml.org/index.php/ijaidsml/article/view/377