27^th EAAAI (EANN) 2026, 16 - 19 July 2026, Chania, Crete, Greece

ORPHEAS: A Cross-Lingual Greek--English Embedding Model for Retrieval-Augmented Generation

Livieris Ioannis, Koursaris Athanasios, Apostolopoulou Alexandra, Kanaris Konstantinos, Tsakalidis Dimitris, Domalis Giorgos

Abstract:

Effective retrieval-augmented generation across bilingual Greek--English applications requires embedding models capable of capturing both domain-specific semantic relationships and cross-lingual semantic alignment. Existing multilingual embedding models distribute their representational capacity across numerous languages, limiting their optimization for Greek and failing to encode the morphological complexity and domain-specific terminological structures inherent in Greek text. In this work, we propose ORPHEAS, a specialized Greek-English embedding model for bilingual retrieval-augmented generation. ORPHEAS is trained with a high quality dataset generated by a knowledge graph-based fine-tuning methodology which is applied to a diverse multi-domain corpus, which enables language-agnostic semantic representations. The numerical experiments across monolingual and cross-lingual retrieval benchmarks reveal that ORPHEAS outperforms state-of-the-art multilingual embedding models, demonstrating that domain-specialized fine-tuning on morphologically complex languages does not compromise cross-lingual retrieval capability.

*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.

27th EAAAI (EANN) 2026, 16 - 19 July 2026, Chania, Crete, Greece

ORPHEAS: A Cross-Lingual Greek--English Embedding Model for Retrieval-Augmented Generation

Livieris Ioannis, Koursaris Athanasios, Apostolopoulou Alexandra, Kanaris Konstantinos, Tsakalidis Dimitris, Domalis Giorgos

Abstract:

27^th EAAAI (EANN) 2026, 16 - 19 July 2026, Chania, Crete, Greece