tailieunhanh - Báo cáo khoa học: "Automatic Labelling of Topic Models"

We propose a method for automatically labelling topics learned via LDA topic models. We generate our label candidate set from the top-ranking topic terms, titles of Wikipedia articles containing the top-ranking topic terms, and sub-phrases extracted from the Wikipedia article titles. We rank the label candidates using a combination of association measures and lexical features, optionally fed into a supervised ranking model. | Automatic Labelling of Topic Models Jey Han Lau Karl Grieser David Newman and Timothy Baldwin Ậ NICTA Victoria Research Laboratory Ọ Dept of Computer Science and Software Engineering University of Melbourne Dept of Computer Science University of California Irvine jhlau@ kgrieser@ newman@ tb@ Abstract We propose a method for automatically labelling topics learned via LDA topic models. We generate our label candidate set from the top-ranking topic terms titles of Wikipedia articles containing the top-ranking topic terms and sub-phrases extracted from the Wikipedia article titles. We rank the label candidates using a combination of association measures and lexical features optionally fed into a supervised ranking model. Our method is shown to perform strongly over four independent sets of topics significantly better than a benchmark method. 1 Introduction Topic modelling is an increasingly popular framework for simultaneously soft-clustering terms and documents into a fixed number of topics which take the form of a multinomial distribution over terms in the document collection Blei et al. 2003 . It has been demonstrated to be highly effective in a wide range of tasks including multidocument summarisation Haghighi and Vander-wende 2009 word sense discrimination Brody and Lapata 2009 sentiment analysis Titov and McDonald 2008 information retrieval Wei and Croft 2006 and image labelling Feng and Lapata 2010 . One standard way of interpreting a topic is to use the marginal probabilities p wi tj associated with each term wi in a given topic tj to extract out the 10 terms with highest marginal probability. This results in term lists such as 1 stock market investor fund trading investment firm exchange companies share Here and throughout the paper we will represent a topic tj via its ranking of top-10 topic terms based on p wi tj . 1536 which are clearly associated with the domain of stock market trading. The aim of this

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.