Chinese EmoBank: Building Valence-Arousal Resources for Dimensional Sentiment Analysis

Lung-Hao Lee, Jian-Hong Li, and Liang-Chih Yu*.

ACM Transactions on Asian and Low-Resource Language Information Processing (ACM TALLIP), 21(4), Article 65, pages 1-18.


Abstract

An increasing amount of research has recently focused on dimensional sentiment analysis that represents affective states as continuous numerical values on multiple dimensions, such as valence-arousal (VA) space. Compared to the categorical approach that represents affective states as distinct classes (e.g., positive and negative), the dimensional approach can provide more fine-grained (real-valued) sentiment analysis. However, dimensional sentiment resources with valence-arousal ratings are very rare, especially for the Chinese language. Therefore, this study aims to: (1) Build a Chinese valence-arousal resource called Chinese EmoBank, the first Chinese dimensional sentiment resource featuring various levels of text granularity including 5,512 single words, 2,998 multi-word phrases, 2,582 single sentences, and 2,969 multi-sentence texts. The valence-arousal ratings are annotated by crowdsourcing based on the Self-Assessment Manikin (SAM) rating scale. A corpus cleanup procedure is then performed to improve annotation quality by removing outlier ratings and improper texts. (2) Evaluate the proposed resource using different categories of classifiers such as lexicon-based, regression-based, and neural-network-based methods, and comparing their performance to a similar evaluation of an English dimensional sentiment resource.