Predicting rare and disruptive severe weather events presents significant challenges due to class imbalance and data sparsity. Conventional oversampling techniques and unimodal approaches are inadequate for these low-frequency phenomena because they fail to capture the events intrinsic complexity and spatiotemporal dynamics. Current methods lack the ability to learn modality-specific representations. Herein, we introduce a robust multimodal fusion strategy that directly integrates primary sensor measurements with supplementary modalities, including textual descriptions and weather forecasts within a tri-modality framework. Our approach is augmented by advanced spatiotemporal feature engineering, ensuring that both spatial and temporal relationships are preserved and effectively leveraged. Notably, our proposed method, which incorporates Automated Surface Observing System (ASOS) sensor data, textual embeddings, and forecast data, achieves substantial performance improvements, elevating macro F1-scores from 0.04 to 0.89 across a ten-class framework (nine severe event classes and one normal class) for 12-hour forecasting horizons. This integrated approach helps overcome data sparsity, particularly in high-latitude regions. Ultimately, this framework provides an effective early warning system for disaster risk assessment and infrastructure resilience forecasting |
*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.