tailieunhanh - Báo cáo khoa học: "Detection of Quotations and Inserted Clauses and its Application to Dependency Structure Analysis in Spontaneous Japanese"

Japanese dependency structure is usually represented by relationships between phrasal units called bunsetsus. One of the biggest problems with dependency structure analysis in spontaneous speech is that clause boundaries are ambiguous. This paper describes a method for detecting the boundaries of quotations and inserted clauses and that for improving the dependency accuracy by applying the detected boundaries to dependency structure analysis. The quotations and inserted clauses are determined by using an SVM-based text chunking method that considers information on morphemes, pauses, fillers, etc. . | Detection of Quotations and Inserted Clauses and its Application to Dependency Structure Analysis in Spontaneous Japanese Ryoji Hamabe Kiyotaka Uchimoto Tatsuya Kawahara Hitoshi Isahara School of Informatics Kyoto University Yoshida-honmachi Sakyo-ku Kyoto 606-8501 Japan National Institute of Information and Communications Technology 3-5 Hikari-dai Seika-cho Soraku-gun Kyoto 619-0289 Japan Abstract Japanese dependency structure is usually represented by relationships between phrasal units called bunsetsus. One of the biggest problems with dependency structure analysis in spontaneous speech is that clause boundaries are ambiguous. This paper describes a method for detecting the boundaries of quotations and inserted clauses and that for improving the dependency accuracy by applying the detected boundaries to dependency structure analysis. The quotations and inserted clauses are determined by using an SVM-based text chunking method that considers information on morphemes pauses fillers etc. The information on automatically analyzed dependency structure is also used to detect the beginning of the clauses. Our evaluation experiment using Corpus of Spontaneous Japanese CSJ showed that the automatically estimated boundaries of quotations and inserted clauses helped to improve the accuracy of dependency structure analysis. 1 Introduction The Spontaneous Speech Corpus and Processing Technology project sponsored the construction of the Corpus of Spontaneous Japanese CSJ Maekawa et al. 2000 . The CSJ is the biggest spontaneous speech corpus in the world consisting of roughly 7M words with the total speech length of 700 hours and is a collection of monologues such as academic presentations and simulated public speeches. The CSJ includes transcriptions of the speeches as well as audio recordings of them. Approximately one tenth of the speeches in the CSJ were manually annotated with various kinds of information such as morphemes sentence boundaries dependency structures and .

crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.