tailieunhanh - Báo cáo khoa học: "From Prosodic Trees to Syntactic Trees"

This paper describes an ongoing effort to parse the Hebrew Bible. The parser consults the bracketing information extracted from the cantillation marks of the Masoetic text. We first constructed a cantillation treebank which encodes the prosodic structures of the text. It was found that many of the prosodic boundaries in the cantillation trees correspond, directly or indirectly, to the phrase boundaries of the syntactic trees we are trying to build. All the useful boundary information was then extracted to help the parser make syntactic decisions, either serving as hard constraints in rule application or used probabilistically in tree ranking | From Prosodic Trees to Syntactic Trees Andi Wu GrapeCity Inc. Kirk Lowery Westminster Hebrew Institute klowery@ Abstract This paper describes an ongoing effort to parse the Hebrew Bible. The parser consults the bracketing information extracted from the cantillation marks of the Masoetic text. We first constructed a cantillation treebank which encodes the prosodic structures of the text. It was found that many of the prosodic boundaries in the cantillation trees correspond directly or indirectly to the phrase boundaries of the syntactic trees we are trying to build. All the useful boundary information was then extracted to help the parser make syntactic decisions either serving as hard constraints in rule application or used probabilistically in tree ranking. This has greatly improved the accuracy and efficiency of the parser and reduced the amount of manual work in building a Hebrew treebank. Introduction The text of the Hebrew Bible HB has been carefully studied throughout the centuries with detailed lexical phonological and morphological analysis available for every verse of HB. However very few attempts have been made at a verse-by-verse syntactic analysis. The only known effort in this direction is the Hebrew parser built by George Yaeger Yaeger 1998 2002 but the analysis is still incomplete in the sense that not all syntactic units are recognized and the accuracy of the trees are yet to be checked. Since a detailed syntactic analysis of HB is of interest to both linguistic and biblical studies we launched a project to build a treebank of the Hebrew Bible. In this project the trees are automatically generated by a parser and then manually checked in a tree editor. Once a tree has been edited or approved its phrase boundaries are recorded in a database. When the same verse is parsed again the existing brackets will force the parser to produce trees whose brackets are exactly the same as those of the manually approved trees. .