tailieunhanh - Báo cáo khoa học: "Discourse Type Clustering using POS n-gram Profiles and High-Dimensional Embeddings"

To cluster textual sequence types (discourse types/modes) in French texts, K-means algorithm with high-dimensional embeddings and fuzzy clustering algorithm were applied on clauses whose POS (part-ofspeech) n-gram profiles were previously extracted. Uni-, bi- and trigrams were used on four 19th century French short stories by Maupassant. For high-dimensional embeddings, power transformations on the chisquared distances between clauses were explored. Preliminary results show that highdimensional embeddings improve the quality of clustering, contrasting the use of biand trigrams whose performance is disappointing, possibly because of feature space sparsity. . | Discourse Type Clustering using POS n-gram Profiles and High-Dimensional Embeddings Christelle Cocco Department of Computer Science and Mathematical Methods University of Lausanne Switzerland Abstract To cluster textual sequence types discourse types modes in French texts K-means algorithm with high-dimensional embeddings and fuzzy clustering algorithm were applied on clauses whose POS part-of-speech n-gram profiles were previously extracted. Uni- bi- and trigrams were used on four 19th century French short stories by Maupassant. For high-dimensional embeddings power transformations on the chi-squared distances between clauses were explored. Preliminary results show that highdimensional embeddings improve the quality of clustering contrasting the use of biand trigrams whose performance is disappointing possibly because of feature space sparsity. 1 Introduction The aim of this research is to cluster textual sequence types named here discourse types 1 such as narrative descriptive argumentative and so on in French texts and especially in short stories which could contain all types. For this purpose texts were segmented into clauses section . To cluster the latter n-gram POS part-of-speech tag profiles were extracted section . POS-tags were chosen because of their expected relation to discourse types. Several authors have used POS-tags among other features for various text classification tasks such as Biber 1988 for text type detection Karl-gren and Cutting 1994 and Malrieu and Rastier 1 Sequence type is an appropriate name because it refers to text passage type. However it will be further mentioned as discourse types a frequent French term. In English a standard term is discourse modes. 2001 for genre classification and Palmer et al. 2007 for situation entity classification. The latter is an essential component of English discourse modes Smith 2009 . Moreover previous work in discourse type detection has shown a dependency between .

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.