Aldenhoven CM., Nissen L., Heinemann M., Doğdu C., Hanke A., Jonas S., Reimer LM.

Facial features hold information about a person’s emotions, motor function, or genetic defects. Since most current mobile devices are capable of real-time face detection using cameras and depth sensors, real-time facial analysis can be utilized in several mobile use cases. Understanding the real-time emotion recognition capabilities of device sensors and frameworks is vital for developing new, valid applications. Therefore, we evaluated on-device emotion recognition using Apple’s ARKit on an iPhone 14 Pro. A native app elicited 36 blend shape-specific movements and 7 discrete emotions from (Formula presented.) healthy adults. Per frame, standardized ARKit blend shapes were classified using a prototype-based cosine similarity metric; performance was summarized as accuracy and area under the receiver operating characteristic curves. Cosine similarity achieved an overall accuracy of (Formula presented.), exceeding the mean of three human raters ((Formula presented.) ; (Formula presented.) percentage points, ≈16% relative). Per-emotion accuracy was highest for joy, fear, sadness, and surprise, and competitive for anger, disgust, and contempt. AUCs were ≥0.84 for all classes. The method runs in real time on-device using only vector operations, preserving privacy and minimizing compute. These results indicate that a simple, interpretable cosine-similarity classifier over ARKit blend shapes delivers human-comparable, real-time facial emotion recognition on commodity hardware, supporting privacy-preserving mobile applications.

Real-Time Emotion Recognition Performance of Mobile Devices: A Detailed Analysis of Camera and TrueDepth Sensors Using Apple’s ARKit

Aldenhoven CM., Nissen L., Heinemann M., Doğdu C., Hanke A., Jonas S., Reimer LM.

DOI

Type

Publication Date

Volume