tailieunhanh - Báo cáo khoa học: "Importance of linguistic constraints in statistical dependency parsing"
Statistical systems with high accuracy are very useful in real-world applications. If these systems can capture basic linguistic information, then the usefulness of these statistical systems improve a lot. This paper is an attempt at incorporating linguistic constraints in statistical dependency parsing. We consider a simple linguistic constraint that a verb should not have multiple subjects/objects as its children in the dependency tree. | Importance of linguistic constraints in statistical dependency parsing Bharat Ram Ambati Language Technologies Research Centre IIIT-Hyderabad Gachibowli Hyderabad India - 500032. ambati@ Abstract Statistical systems with high accuracy are very useful in real-world applications. If these systems can capture basic linguistic information then the usefulness of these statistical systems improve a lot. This paper is an attempt at incorporating linguistic constraints in statistical dependency parsing. We consider a simple linguistic constraint that a verb should not have multiple subjects objects as its children in the dependency tree. We first describe the importance of this constraint considering Machine Translation systems which use dependency parser output as an example application. We then show how the current state-of-the-art dependency parsers violate this constraint. We present two new methods to handle this constraint. We evaluate our methods on the state-of-the-art dependency parsers for Hindi and Czech. 1 Introduction Parsing is one of the major tasks which helps in understanding the natural language. It is useful in several natural language applications. Machine translation anaphora resolution word sense disambiguation question answering summarization are few of them. This led to the development of grammar-driven data-driven and hybrid parsers. Due to the availability of annotated corpora in recent years data driven parsing has achieved considerable success. The availability of phrase structure treebank for English Marcus et al. 1993 has seen the development of many efficient parsers. Using the dependency analysis a similar large scale annotation effort for Czech has been the Prague Dependency Treebank Ha-jicova 1998 . Unlike English Czech is a free-word-order language and is also morphologically very rich. It has been suggested that free-word-order languages can be handled better using the dependency based framework than the constituency .
đang nạp các trang xem trước