tailieunhanh - Báo cáo khoa học: "Test Collection Selection and Gold Standard Generation for a Multiply-Annotated Opinion Corpus"

Opinion analysis is an important research topic in recent years. However, there are no common methods to create evaluation corpora. This paper introduces a method for developing opinion corpora involving multiple annotators. The characteristics of the created corpus are discussed, and the methodologies to select more consistent testing collections and their corresponding gold standards are proposed. Under the gold standards, an opinion extraction system is evaluated. The experiment results show some interesting phenomena. . | Test Collection Selection and Gold Standard Generation for a Multiply-Annotated Opinion Corpus Lun-Wei Ku Yong-Shen Lo and Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University lwku yslo @ hhchen@ Abstract 2 Corpus Annotation Opinion analysis is an important research topic in recent years. However there are no common methods to create evaluation corpora. This paper introduces a method for developing opinion corpora involving multiple annotators. The characteristics of the created corpus are discussed and the methodologies to select more consistent testing collections and their corresponding gold standards are proposed. Under the gold standards an opinion extraction system is evaluated. The experiment results show some interesting phenomena. 1 Introduction Opinion information processing has been studied for several years. Researchers extracted opinions from words sentences and documents and both rule-based and statistical models are investigated Wiebe et al 2002 Pang et al 2002 . The evaluation metrics precision recall and f-measure are usually adopted. A reliable corpus is very important for the opinion information processing because the annotations of opinions concern human perspectives. Though the corpora created by researchers were analyzed Wiebe et al. 2002 the methods to increase the reliability of them were seldom touched. The strict and lenient metrics for opinions were mentioned but not discussed in details together with the corpora and their annotations. This paper discusses the selection of testing collections and the generation of the corresponding gold standards under multiple annotations. These testing collections are further used in an opinion extraction system and the system is evaluated with the corresponding gold standards. The analysis of human annotations makes the improvements of opinion analysis systems feasible. 89 Opinion corpora are constructed for the research .