| The rapid progress of large language models (LLMs) has unlocked significant capabilities across diverse domains, yet has also raised critical concerns regarding the trustworthiness and safety of these systems. Although many studies address isolated risks such as hallucinations, bias, or vulnerability to adversarial prompts, comprehensive frameworks for systematically evaluating LLM trustworthiness remain scarce. This research proposes a novel evaluation framework to benchmark LLMs along key dimensions: honesty, bias mitigation, calibration, consistency, and resistance to deception. Drawing inspiration from recent safety benchmarks, we design specialized evaluation protocols for each dimension and compute aggregate trustworthiness scores using weighted combinations of individual metrics. Experiments on prominent LLMs demonstrate significant variability in performance across dimensions, highlighting strengths and weaknesses of current models. Our results underscore the need for multidimensional safety evaluations and provide practical tools for developers, policymakers, and researchers looking to build and deploy more trustworthy AI systems. |
*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.