| Emotion recognition from electroencephalogram (EEG) signals plays a crucial role in applications such as mental health monitoring and understanding brain functions. However, accurately identifying the most informative features for computer-based emotion recognition remains a significant challenge. Although recent deep learning approaches have made progress, the feature inconsistency across modalities still limits performance. To this end, this paper presents a novel Wavelet Transform-guided emotion recognition framework with a cross-modal feature consistency. The proposed approach combines wavelet-transformed features of EEG signals with statistical features to enhance representation learning. Feature extraction is performed using a combination of Artificial Neural Network (ANN), 1D Convolutional Neural Network (CNN), and Long Short-Term Memory (LSTM) models, improving the robustness of emotion recognition. To align these heterogeneous features, a Modality-Agnostic Consistency loss function is introduced, incorporating categorical cross-entropy loss and a newly designed Feature Alignment loss. This loss promotes consistency across modalities, ensuring the extraction of complementary and coherent features for the improved the learning strategy. The framework is evaluated on three benchmark EEG datasets, namely SEED, CASE, and EEG Brainwave. Results demonstrate that the method achieves state-of-the-art performance, highlighting its effectiveness in improving cross-modal feature alignment, thereby advancing EEG-based emotion recognition. Code is available at: https://github.com/asfakali/Hybrid-Emotion-Recognition. | 
*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.