Credit Card Customer Profiling Using Self-Supervised Representation Learning on Multi-Source Financial Data
DOI:
https://doi.org/10.63282/3050-9262.IJAIDSML-V6I1P118Keywords:
Self- Supervised Learning, Multi-Source Data Integration, Representation Learning, Financial Analytics, Customer Segmentation, Contrastive Learning, Deep Learning, Churn PredictionAbstract
The recent improvement of machine learning and data integration has marked new heights in financial analytics. A major area here that has become very popular is credit card customer profiling, which aims to identify and classify the behavior, preferences and risks of the customer. Traditional methods depend largely on supervised learning, which has to be based on large labeled data points. Nonetheless, since the emergence of Self-Supervised Learning (SSL), it is possible to derive meaningful representations by selecting unlabeled and heterogeneous data sources. This paper presents an original scheme for credit card customer profiling based on self-supervised representation learning and financial data from multiple sources. The transaction records, customer demographics, online banking activity and credit scores are brought together through a single analytical model. The method builds into contrastive learning and transformer-based architectures to learn feature embeddings that are robust. We show high-quality profiling, clustering, and downstream tasks like creditworthiness assessment problems and churn prediction on a real-world financial dataset that was collected prior to February 2025 and made up a significant bulk of our experiments, stating that our framework essentially outperforms baselines in terms of profiling accuracy, clustering performance, and downstream tasks accurately. We elaborate on the comparisons of performance to standard models, the advantages of multiple-source merging, as well as what this could mean to the individual tailored financial services. We have also added thorough visualization, flowcharts and ablation studies to complement our findings
References
[1] MacQueen, J. (1967, January). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics (Vol. 5, pp. 281-298). University of California Press.
[2] Reynolds, D. (2009). Gaussian mixture models. In Encyclopedia of Biometrics (pp. 659-663). Springer, Boston, MA.
[3] Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1, 81-106.
[4] Breiman, L. (2001). Random forests. Machine learning, 45, 5-32.
[5] Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree-boosting system. In Proceedings of the 22nd ACM sigkdd International Conference on Knowledge Discovery and Data Mining (pp. 785-794).
[6] Dal Pozzolo, A., Boracchi, G., Caelen, O., Alippi, C., & Bontempi, G. (2017). Credit card fraud detection: realistic modeling and a novel learning strategy. IEEE transactions on neural networks and learning systems, 29(8), 3784-3797.
[7] He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9), 1263-1284.
[8] Chen, X., Fan, H., Girshick, R., & He, K. (2020). Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297.
[9] Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020, November). A simple framework for contrastive learning of visual representations. In International conference on machine learning (pp. 1597-1607). PmLR.
[10] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019, June). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, volume 1 (long and short papers) (pp. 4171-4186).
[11] Somepalli, G., Goldblum, M., Schwarzschild, A., Bruss, C. B., & Goldstein, T. (2021). Saint: Improved neural networks for tabular data via row attention and contrastive pre-training. arXiv preprint arXiv:2106.01342.
[12] Liu, X., Zhang, F., Hou, Z., Mian, L., Wang, Z., Zhang, J., & Tang, J. (2021). Self-supervised learning: Generative or contrastive. IEEE transactions on knowledge and data engineering, 35(1), 857-876.
[13] Adewumi, A. O., & Akinyelu, A. A. (2017). A survey of machine-learning and nature-inspired based credit card fraud detection techniques. International Journal of System Assurance Engineering and Management, 8, 937-953.
[14] Ileberi, E., Sun, Y., & Wang, Z. (2021). Performance evaluation of machine learning methods for credit card fraud detection using SMOTE and AdaBoost. Ieee Access, 9, 165286-165294.
[15] Gui, J., Chen, T., Zhang, J., Cao, Q., Sun, Z., Luo, H., & Tao, D. (2024). A survey on self-supervised learning: Algorithms, applications, and future trends. IEEE Transactions on Pattern Analysis and Machine Intelligence.
[16] Ericsson, L., Gouk, H., Loy, C. C., & Hospedales, T. M. (2022). Self-supervised representation learning: Introduction, advances, and challenges. IEEE Signal Processing Magazine, 39(3), 42-62.
[17] Dixon, M. F., Halperin, I., & Bilokon, P. (2020). Machine learning in finance (Vol. 1170). New York, NY, USA: Springer International Publishing.
[18] Lei, X., Mohamad, U. H., Sarlan, A., Shutaywi, M., Daradkeh, Y. I., & Mohammed, H. O. (2022). The development of an intelligent information system for financial analysis depends on supervised machine learning algorithms. Information Processing & Management, 59(5), 103036.
[19] Liu, Y., Jin, M., Pan, S., Zhou, C., Zheng, Y., Xia, F., & Yu, P. S. (2022). Graph self-supervised learning: A survey. IEEE transactions on knowledge and data engineering, 35(6), 5879-5900.
[20] Garbarino, S., & Holland, J. (2009). Quantitative and qualitative methods in impact evaluation and measuring results.










