| We propose a hybrid framework for knowledge extraction from large text corpora that combines aspect-conditioned LLM summarization with clustering-based topic modeling. The approach selects the most semantically stable prompting strategy via entropy minimization, generates aspect-focused summaries, and applies QA-based filtering before clustering. Applied to aviation incident reports, decomposed chain-of-thought prompting substantially reduces semantic instability compared to zero-shot generation while preserving macro-level thematic structure. The resulting topics are more causally oriented, improving interpretability while maintaining structural fidelity to the underlying corpus. |
*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.