Chronic Kidney Disease (CKD) is a major global health concern affecting millions of people. Early identification and accurate prognosis of CKD are essential to reduce healthcare costs and improve patient outcomes. This study evaluates the predictive performance of three widely used machine learning classifiers—Decision Tree, Random Forest, and Naive Bayes—using a dataset of 400 clinical records from the UCI Machine Learning Repository, which contains diverse patient attributes. Missing values, class imbalance, and feature redundancy were addressed using K-Nearest Neighbors for imputation, Synthetic Minority Oversampling Technique (SMOTE) for balancing, and a combination of Principal Component Analysis (PCA) and Recursive Feature Elimination (RFE) for feature selection. Rather than relying on a single model, this work conducts a comparative analysis to enhance interpretability and reliability in CKD prediction. The models were evaluated using accuracy, precision, recall, and F1-score. Random Forest achieved the highest accuracy of 96\%, followed by Decision Tree at 94\%, and Naive Bayes at 90\%. These results demonstrate the strong performance of ensemble and rule-based classifiers in structured clinical data scenarios, while also highlighting the trade-offs between interpretability and predictive power. This study supports the integration of interpretable machine learning models into clinical decision-making pipelines for early CKD diagnosis. Future research will focus on improving model transparency, computational efficiency, and deployment in real-time clinical environments. |
*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.