Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Proﬁting from Mark-Up: Hyper-Text Annotations for Guided Parsing'

Kiều Dung 57 10 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

We show how web mark-up can be used to improve unsupervised dependency parsing. Starting from raw bracketings of four common HTML tags (anchors, bold, italics and underlines), we reﬁne approximate partial phrase boundaries to yield accurate parsing constraints. Conversion procedures fall out of our linguistic analysis of a newly available million-word hyper-text corpus. We demonstrate that derived constraints aid grammar induction by training Klein and Manning’s Dependency Model with Valence (DMV) on this data set: parsing accuracy on Section 23 (all sentences) of the Wall Street Journal corpus jumps to 50.4%, beating previous state-of-theart by more than 5%. . | Profiting from Mark-Up Hyper-Text Annotations for Guided Parsing Valentin I. Spitkovsky Computer Science Department Stanford University and Google Inc. valentin@google.com Daniel Jurafsky Departments of Linguistics and Computer Science Stanford University jurafsky@stanford.edu Hiyan Alshawi Google Inc. hiyan@google.com Abstract We show how web mark-up can be used to improve unsupervised dependency parsing. Starting from raw bracketings of four common HTML tags anchors bold italics and underlines we refine approximate partial phrase boundaries to yield accurate parsing constraints. Conversion procedures fall out of our linguistic analysis of a newly available million-word hyper-text corpus. We demonstrate that derived constraints aid grammar induction by training Klein and Manning s Dependency Model with Valence DMV on this data set parsing accuracy on Section 23 all sentences of the Wall Street Journal corpus jumps to 50.4 beating previous state-of-the-art by more than 5 . Web-scale experiments show that the DMV perhaps because it is unlexicalized does not benefit from orders of magnitude more annotated but noisier data. Our model trained on a single blog generalizes to 53.3 accuracy out-of-domain against the Brown corpus nearly 10 higher than the previous published best. The fact that web mark-up strongly correlates with syntactic structure may have broad applicability in NLP. 1 Introduction Unsupervised learning of hierarchical syntactic structure from free-form natural language text is a hard problem whose eventual solution promises to benefit applications ranging from question answering to speech recognition and machine translation. A restricted version of this problem that targets dependencies and assumes partial annotation sentence boundaries and part-of-speech POS tagging has received much attention. Klein and Manning 2004 were the first to beat a simple parsing heuristic the right-branching baseline today s state-of-the-art systems Headden et al. 2009 Cohen

TÀI LIỆU LIÊN QUAN

Kỷ yếu tóm tắt báo cáo khoa học: Hội nghị khoa học tim mạch toàn quốc lần thứ XI - Hội tim mạch Quốc gia Việt Nam

Báo cáo nghiên cứu khoa học: "Danh lục các loài thú ở khu bảo tồn thiên nhiên Pù Huống tỉnh Nghệ An và ý nghĩa bảo tồn nguồn gen quí hiếm của chúng"

Báo cáo khoa học: Hỗ trợ nâng cao năng lực quản lý chất thải sinh hoạt tại thành phố Hội An

Báo cáo nghiên cứu khoa học: "Tính năng động nghệ thuật của văn học hiện đại Việt Nam và một cách nhìn hành trình thể loại"

Báo cáo nghiên cứu khoa học: " DỊCH CHUYỂN TRUY VẤN OQL VÀO CÁC PHÉP TÍNH BAO HÀM"

Báo cáo khoa học: " Áp dụng thủ tục phân tích trong kiểm toán báo cáo tài chính"

Báo cáo nghiên cứu khoa học: "Người lính trở về sau chiến tranh với mặc cảm “ăn mày dĩ vãng’ trong tiểu thuyết Chu Lai"

Báo cáo nghiên cứu khoa học: "Khảo sát hiện tượng chuyển đổi chức năng - nghĩa của động từ tiếng Việt"

Báo cáo nghiên cứu khoa học: " BẢN CHẤT KHOA HỌC VÀ CÁCH MẠNG LÀ CỘI NGUỒN SỨC SỐNG CỦA CHỦ NGHĨA MÁC - LÊNIN"

Báo cáo khoa học: " CẢI TIẾN CÁC THUẬT TOÁN MƯỢN VÀ KHOÁ KÊNH TẦN SỐ MẠNG DI ĐỘNG TẾ BÀO"

crossorigin="anonymous">

Đã phát hiện trình chặn quảng cáo AdBlock

Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.