Building a Confused Character Set for Chinese Spell Checking

Lung-Hao Lee, Wun-Syuan Wu, Jian-Hong Li, Yu-Chi Lin, and Yuen-Hsien Tseng.

In Proceedings of 27th International Conference on Computers in Education (ICCE’19), pages 703-705.


Abstract

In this paper, we describe the construction details of a confused character set for Chinese spell checking. The SIGHAN 2013-2015 bakeoff datasets are adopted to measure the performance of correct character suggestions. Our confusion set significantly outperforms the existing confusion set in candidate selection for automatic spelling checkers.