This paper is available on arxiv under CC 4.0 license.
Authors:
(1) Zhihang Ren, University of California, Berkeley and these authors contributed equally to this work (Email: [email protected]);
(2) Jefferson Ortega, University of California, Berkeley and these authors contributed equally to this work (Email: [email protected]);
(3) Yifan Wang, University of California, Berkeley and these authors contributed equally to this work (Email: [email protected]);
(4) Zhimin Chen, University of California, Berkeley (Email: [email protected]);
(5) Yunhui Guo, University of Texas at Dallas (Email: [email protected]);
(6) Stella X. Yu, University of California, Berkeley and University of Michigan, Ann Arbor (Email: [email protected]);
(7) David Whitney, University of California, Berkeley (Email: [email protected]).
Table of Links
- Abstract and Intro
- Related Wok
- VEATIC Dataset
- Experiments
- Discussion
- Conclusion
- More About Stimuli
- Annotation Details
- Outlier Processing
- Subject Agreement Across Videos
- Familiarity and Enjoyment Ratings and References
5. Discussion
Understanding how humans infer the emotions of others is essential for researchers understanding of social cognition. While psychophysicists conduct experiments, they need specific stimulus sets to design experiments. However, among published datasets, there is currently no contextbased video dataset that contains continuous valence and arousal ratings. The lack of this kind of datasets also prevents researchers from developing computer vision algorithms for the corresponding tasks. Our proposed VEATIC dataset fills in this important gap in the field of computer vision and will be beneficial for psychophysical studies in understanding emotion recognition. D
During data collection, participants continuously tracked and rated the emotions of target characters in the video clips which is different from general psychophysical experiments where responses are collected after a delay. This design in our dataset was vital in order to mimic the real-time emotion processing that occurs when humans process emotions in their everyday lives. Additionally, emotion processing is not an immediate process and it relies heavily on the temporal accumulation of information over time in order to make accurate inferences about the emotions of others.
The strength of the VEATIC dataset is that it mimics how humans perceive emotions in the real world: continuously and in the presence of contextual information both in the temporal and spatial domain. Such a rich dataset is vital for future computer vision models and can push the boundaries of what current models can accomplish. With the creation of more rich datasets like VEATIC, it may be possible for future computer vision models to perceive emotions in realtime while interacting with humans.
This paper is available on arxiv under CC 4.0 license.
 
 