tailieunhanh - Báo cáo khoa học: "Hierarchical Multi-Class Text Categorization with Global Margin Maximization"

Text categorization is a crucial and wellproven method for organizing the collection of large scale documents. In this paper, we propose a hierarchical multi-class text categorization method with global margin maximization. We not only maximize the margins among leaf categories, but also maximize the margins among their ancestors. Experiments show that the performance of our algorithm is competitive with the recently proposed hierarchical multi-class classification algorithms. | Hierarchical Multi-Class Text Categorization with Global Margin Maximization Xipeng Qiu School of Computer Science Fudan University xpqiu@ Wenjun Gao School of Computer Science Fudan University wjgao616@ Xuanjing Huang School of Computer Science Fudan University xjhuang@ Abstract Text categorization is a crucial and well-proven method for organizing the collection of large scale documents. In this paper we propose a hierarchical multi-class text categorization method with global margin maximization. We not only maximize the margins among leaf categories but also maximize the margins among their ancestors. Experiments show that the performance of our algorithm is competitive with the recently proposed hierarchical multi-class classification algorithms. 1 Introduction In the past serval years hierarchical text categorization has become an active research topic in database area Koller and Sahami 1997 Weigend et al. 1999 and machine learning area Rousu et al. 2006 Cai and Hofmann 2007 . Hierarchical categorization methods can be divided in two types local and global approaches Wang et al. 1999 Sun and Lim 2001 . A local approach usually proceeds in a top-down fashion which firstly picks the most relevant categories of the top level and then recursively making the choice among the low-level categories. The global approach builds only one classifier to discriminate all categories in a hierarchy. Due that the global hierarchical categorization can avoid the drawbacks about those high-level irrecoverable error it is more popular in the machine learning domain. The essential idea behind global approach is that the close classes nodes have some common underlying factors. Especially the descendant classes can share the characteristics of the ancestor classes which is similar with multi-task learn-ing Caruana 1997 . A key problem for global hierarchical categorization is how to combine these underlying factors. In this paper we propose an .