Methods of Interpretability of Deep Neural Networks in Decision-Making Tasks
DOI:
https://doi.org/10.63282/3050-9262.IJAIDSML-V6I4P119Keywords:
Explainable AI, Interpretability, Deep Neural Networks, Saliency Methods, Counterfactual Explanations, Differential Privacy, Decision‑MakingAbstract
Interpretability is the main obstacle to deploying deep neural networks in high‑stakes settings. Despite high accuracy, regulators and users require explanations they can understand. This review summarizes evidence on which explanation methods truly support decision‑making. A PRISMA filter selected fifteen peer‑reviewed experiments from 2017 to 2025. Saliency, attribution, counterfactual, and inherently transparent techniques were reorganized. When publicly available, effect sizes from confusion matrices were recalculated; narrative synthesis filled in missing details. No new data were collected. Results were compared based on fidelity, stability, cognitive load, and privacy tolerance. Three consistent signals appeared. Gradient‑based visualizations excelled on medical images, while additive neural models led in credit‑risk scenarios, confirming task–method matching. Explanation stability dropped about 25% when differential privacy budgets fell below ε = 3, even though accuracy remained stable. The median fidelity gap for the top method per task was 2.3%. User‑trust scores increased when brief counterfactual explanations accompanied saliency heatmaps; combining these consistently improved audit‑ability. The synthesis creates a “Context–Layer–Fit” matrix that treats interpretability as a design requirement instead of an afterthought. Engineers and policymakers can use it to choose transparent pipelines that balance accuracy, privacy, and cognitive effort. Future research should test robustness under data shifts, establish combined quality–latency benchmarks, and promote open‑source explanation tools
References
[1] Antamis, T., Drosou, A., & Vafeiadis, T. (2024). Interpretability of deep neural networks: A review of methods, classification and hardware. Neurocomputing, 601, 128204. https://doi.org/10.1016/j.neucom.2024.128204
[2] Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad‑CAM: Visual explanations from deep networks via gradient‑based localization. In Proceedings of ICCV (pp. 618–626). https://doi.org/10.1109/ICCV.2017.74
[3] Jiang, J., Leofante, F., Rago, A., & Toni, F. (2024). Robust counterfactual explanations in machine learning: A survey. In Proceedings of IJCAI‑24 (pp. 8086–8094).
[4] Ullah, N., Guzmán‑Aroca, F., Martínez‑Álvarez, F., De Falco, I., & Sannino, G. (2025). A novel explainable AI framework for medical image classification integrating statistical, visual, and rule‑based methods. Medical Image Analysis, 105, 103665. https://doi.org/10.1016/j.media.2025.103665
[5] Hu, J., Cui, J., & Yang, B. (2025). Learning interpretable network dynamics via universal neural symbolic regression. Nature Communications, 16, 6226. https://doi.org/10.1038/s41467‑025‑61575‑7
[6] Leblanc, B. (2024). On the relationship between interpretability and explainability in machine learning. arXiv:2311.11491.
[7] Kares, F., Speith, T., Zhang, H., & Langer, M. (2025). What makes for a good saliency map? Comparing strategies for evaluating saliency maps in explainable AI (XAI). arXiv:2504.17023.
[8] Nanda, A., Balija, S. B., & Sahoo, D. (2025). FedNAMs: Performing interpretability analysis in federated learning context. arXiv:2506.17466.
[9] Fan, Y., Bao, H., & Lei, X. (2025). Filter differentiation: An effective approach to interpret convolutional neural networks. Information Sciences, 716, 122253. https://doi.org/10.1016/j.ins.2025.122253
[10] Kraus, M., Tschernutter, D., Weinzierl, S., & Zschech, P. (2024). Interpretable generalized additive neural networks. European Journal of Operational Research, 317(2), 303–316. https://doi.org/10.1016/j.ejor.2023.06.032
[11] Carrow, S., Erwin, K. H., Vilenskaia, O., Ram, P., Klinger, T., & Gray, A. (2025). Neural Reasoning Networks: Efficient interpretable neural networks with automatic textual explanations. In Proceedings of AAAI‑25.
[12] Hesse, R., Fischer, J., Schaub‑Meyer, S., & Roth, S. (2025). Disentangling polysemantic channels in convolutional neural networks. arXiv:2504.12939.
[13] Padalkar, P., Lee, J., Wei, S., & Gupta, G. (2025). Improving interpretability and accuracy in neuro‑symbolic rule extraction using class‑specific sparse filters. arXiv:2501.16677.
[14] Seifi, S., Sukianto, T., Carbonelli, C., Servadei, L., & Wille, R. (2025). Learning interpretable rules from neural networks: Neurosymbolic AI for radar hand‑gesture recognition. arXiv:2506.22443.
[15] Iqbal, S., Qureshi, A. N., Alhussein, M., Aurangzeb, K., & Anwar, M. S. (2024). AD‑CAM: Enhancing interpretability of convolutional neural networks with a lightweight frameworkFrom black box to glass box. IEEE Journal of Biomedical and Health Informatics, 28(1), 514–525. https://doi.org/10.1109/JBHI.2023.3329231










