| Facial expression recognition algorithms, particularly those employing supervised deep learning, often rely on extensive training data. Most datasets focus on frontal views, limiting their use in diverse scenarios while challenges like lighting or distance variations further complicate data collection and recognition. To overcome these challenges, we propose a method that generates comprehensive multimodal synthetic visual data, for both active and static facial expression recognition methods. The generated data consist of (i) sequences of facial 3D models, (ii) 3D simulation scenes, and (iii) synthetic images/videos captured from multiple viewpoints, particularly non-frontal angles, and varying lighting conditions. Based on the proposed methodology, a novel dataset was generated, which is publicly available. The validity of the generated data was assessed by training an expression recognition method in two experimental setups: one utilizing real-world frontal-only data and the other combining real-world data with the generated synthetic data. The method trained in the latter setup demonstrated an accuracy improvement of over 10% when evaluated on a separate dataset containing footage from various viewpoints. The results highlight the positive impact of synthetic data on the performance of expression recognition methods, especially when tasked with handling non-trivial cases, not covered by the real-world training data. |
*** Title, author list and abstract as submitted during Camera-Ready version delivery. Small changes that may have occurred during processing by Springer may not appear in this window.