tailieunhanh - Báo cáo khoa học: "Efficient sentence retrieval based on syntactic structure"
This paper proposes an efficient method of sentence retrieval based on syntactic structure. Collins proposed Tree Kernel to calculate structural similarity. However, structual retrieval based on Tree Kernel is not practicable because the size of the index table by Tree Kernel becomes impractical. We propose more efficient algorithms approximating Tree Kernel: Tree Overlapping and Subpath Set. | Efficient sentence retrieval based on syntactic structure Ichikawa Hiroshi Hakoda Keita Hashimoto Taiichi and Tokunaga Takenobu Department of Computer Science Tokyo Institute of Technology ichikawa hokoda taiichi take @ Abstract This paper proposes an efficient method of sentence retrieval based on syntactic structure. Collins proposed Tree Kernel to calculate structural similarity. However structual retrieval based on Tree Kernel is not practicable because the size of the index table by Tree Kernel becomes impractical. We propose more efficient algorithms approximating Tree Kernel Tree Overlapping and Subpath Set. These algorithms are more efficient than Tree Kernel because indexing is possible with practical computation resources. The results of the experiments comparing these three algorithms showed that structural retrieval with Tree Overlapping and Subpath Set were faster than that with Tree Kernel by 100 times and 1 000 times respectively. 1 Introduction Retrieving similar sentences has attracted much attention in recent years and several methods have been already proposed. They are useful for many applications such as information retrieval and machine translation. Most of the methods are based on frequencies of surface information such as words and parts of speech. These methods might work well concerning similarity of topics or contents of sentences. Although the surface information of two sentences is similar their syntactic structures can be completely different Figure 1 . If a translation system regards these sentences as similar the translation would fail. This is because conventional retrieval techniques exploit only similarity of surface information such as words and parts-of-speech but not more abstract information such as syntactic structures. Figure 1 Sentences similar in appearance but differ in syntactic structure Collins et al. Collins 2001a Collins 2001b proposed Tree Kernel a method to calculate a similarity between syntactic
đang nạp các trang xem trước