Accurately evaluating research impact is crucial in academia, influencing fund- ing, promotions, and recognition. However, excessive self-citations distort citation metrics, undermining fair assessment. This work introduces a novel approach to detect anomalous self-citations using citation network analysis and advanced Natural Language Processing with Large Language Models. A citation network is constructed from a large-scale academic dataset, where nodes represent papers and authors, and edges capture citation relationships. Self-citation loops are identified using graph-based techniques. A two-stage summarization process is implemented to generate a comprehensive summary. Regular expressions facil- itate citation context detection, while prompt fine-tuning through self-contrast improves the LLMs’ ability to classify essential vs. non-essential citations, reducing prompt loss to 0.082. Extensive testing confirms the effectiveness of this approach, with o1-mini achiev- ing 91.84% accuracy on 49 self-citation cases across a set of authors.The findings provide actionable insights to enhance transparency and fairness in research evaluation. By address- ing ethical concerns in scholarly publishing, this research promotes integrity and equitable academic assessments. |
*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.