APA Style
James Thomas Black, Muhammad Zeeshan Shakir. (2025). AI Enabled Facial Emotion Recognition Using Low-Cost Thermal Cameras. Computing&AI Connect, 2 (Article ID: 0019). https://doi.org/Registering DOIMLA Style
James Thomas Black, Muhammad Zeeshan Shakir. "AI Enabled Facial Emotion Recognition Using Low-Cost Thermal Cameras". Computing&AI Connect, vol. 2, 2025, Article ID: 0019, https://doi.org/Registering DOI.Chicago Style
James Thomas Black, Muhammad Zeeshan Shakir. 2025. "AI Enabled Facial Emotion Recognition Using Low-Cost Thermal Cameras." Computing&AI Connect 2 (2025): 0019. https://doi.org/Registering DOI.Volume 2, Article ID: 2025.0019
James Thomas Black
james.black@uws.ac.uk
Muhammad Zeeshan Shakir
Muhammad.shakir@uws.ac.uk
1 University of the West of Scotland, School of Computing, Engineering & Physical Sciences, University Avenue, Ayr KA8 0SX, Scotland, United Kingdom.
* Author to whom correspondence should be addressed
Received: 11 Mar 2025 Accepted: 25 Jun 2025 Available Online: 30 Jun 2025
While expensive hardware has historically dominated emotion recognition, our research explores the viability of cost-effective alternatives by utilising IoT-based low-resolution cameras with Vision Transformers (ViTs) and Convolutional Neural Networks (CNNs). In this work, we introduce a novel dataset specifically for thermal facial expression recognition and conduct a comprehensive performance analysis using ResNet, a standard ViT model developed by Google, and a modified ViT model tailored to be trained on smaller dataset sizes. This allows us to compare the efficacy of the more recent ViT architecture against the traditional CNN. Our findings reveal that not only do ViT models learn more swiftly than ResNet, but they also demonstrate superior performance across all metrics on our dataset. Furthermore, our investigation extends to the Kotani Thermal Facial Emotion (KTFE) test set, where we evaluate the generalisation capability of these models when trained using a hybrid approach that combines our dataset with the KTFE dataset. Both ResNet and the ViT model by Google achieved high performance on the KTFE test samples, suggesting that leveraging diverse data sources can significantly strengthen model robustness and adaptability. This study highlights three critical implications: the promising role of accessible and affordable thermal imaging technology in emotion classification; the potential of ViT models to redefine state-of-the-art approaches in this domain; and the importance of dataset diversity in training models with greater generalisation power. By bridging the gap between affordability and sophistication, this research contributes valuable insights into the fields of emotion recognition and affective computing.
Disclaimer: This is not the final version of the article. Changes may occur when the manuscript is published in its final format.
We use cookies to improve your experience on our site. By continuing to use our site, you accept our use of cookies. Learn more