AI-Powered Bug Triage Using Retrieval-Augmented Generation: A Weighted Confidence Scoring Approach with AWS Bedrock and Vector Search

Authors

  • Rajasekhar, Sunkara Independent Researcher, USA. Author

DOI:

https://doi.org/10.63282/3050-9262.IJAIDSML-V6I2P125

Keywords:

Bug Triage, Retrieval-Augmented Generation, RAG, Large Language Model, AWS Bedrock, Anthropic Claude, Vector Search, Confidence Scoring, Knowledge Base, Graphics Engineering, Flask Dashboard, Root Cause Analysis

Abstract

Manual triage of incoming bug reports in a graphics engineering organization is expensive. A senior engineer reads the report, retrieves prior issues that look similar, consults architecture documentation, searches the source code for the relevant components, and produces an opinion on what the root cause is likely to be and which team should own the fix. This work commonly takes hours per report. This paper describes an AI bug triage agent that performs an equivalent analysis automatically and produces a structured root cause hypothesis with a confidence score. The agent is built on AWS Bedrock using Anthropic Claude as the language model. It uses a Retrieval-Augmented Generation pipeline grounded in a curated knowledge base of one thousand one hundred or more previously resolved issues, together with architecture documentation and source code search. It analyzes crash logs attached to the bug report when crash logs are present. The output is a root cause analysis with a confidence score derived from a weighted combination of five signals: historical pattern match against the knowledge base of resolved issues, source code match against the components implicated by the report, crash stack analysis, log evidence, and fix ownership. The weights adjust dynamically based on which signals are available for a given report. A Flask web dashboard exposes real-time triage status, analytics, filterable history, and routing views for issues that fall outside the team scope. Deployment of the agent reduced manual triage latency from hours to minutes for the cases the agent handles end to end, while preserving analyst trust through transparent and inspectable confidence scoring. The paper describes the architecture, the scoring algorithm, the dashboard, and the operational discipline that keeps the agent useful.

References

[1] Lewis, P. et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020.

[2] Anthropic. Claude model family documentation.

[3] Amazon Web Services. Amazon OpenSearch Service documentation, including vector search and k-NN.

[4] Amazon Web Services. Amazon Titan embeddings documentation.

[5] Flask project documentation.

[6] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. Attention Is All You Need. NeurIPS, 2017.

[7] Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT, 2019.

[8] Brown, T. B. et al. Language Models are Few-Shot Learners. NeurIPS, 2020.

[9] Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., and Yih, W. Dense Passage Retrieval for Open-Domain Question Answering. EMNLP, 2020.

[10] Reimers, N. and Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. EMNLP-IJCNLP, 2019.

[11] Johnson, J., Douze, M., and Jegou, H. Billion-Scale Similarity Search with GPUs. IEEE Transactions on Big Data, 2021.

[12] Izacard, G. and Grave, E. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering. EACL, 2021.

[13] Touvron, H. et al. LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971, 2023.

Published

2025-06-08

Issue

Section

Articles

How to Cite

1.
Sunkara R. AI-Powered Bug Triage Using Retrieval-Augmented Generation: A Weighted Confidence Scoring Approach with AWS Bedrock and Vector Search. IJAIDSML [Internet]. 2025 Jun. 8 [cited 2026 Jun. 27];6(2):225-8. Available from: https://ijaidsml.org/index.php/ijaidsml/article/view/582