Automated Program Repair (APR) is a domain of research in software engineering that focuses on providing computationally-generated fixes to buggy code. The primary objective is to alleviate the challenges associated with identifying and rectifying errors that exist within large-scale projects. Fault localization is a critical stage of the APR pipeline, dedicated to identifying the locations of bugs within the code. Despite Python now being one of the most popular programming languages, most existing fault localization techniques are limited to real-world Java and C repositories. This paper proposes a graph-based representation of buggy code that utilizes flow of control and data to capture both semantic and syntactic information. We also present an analysis of a novel approach, Class-Imbalanced Learning on Graphs (CILG) for fault localization, as an alternative to conventional methods of calculating program element suspiciousness scores. The proposed approach is trained and tested on a real-world dataset containing buggy Python code snippets extracted from the PyTraceBugs dataset, achieving a notable macro Area Under the Curve-Receiver Operating Characteristic (AUC-ROC) score of 0.85. We have also provided a comparison with Graph Neural Network (GNN) models and gpt-3.5-turbo to demonstrate the effectiveness of our technique. |
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.