tailieunhanh - Báo cáo khoa học: "Automatic Cost Estimation for Tree Edit Distance Using Particle Swarm Optimization"
Recently, there is a growing interest in working with tree-structured data in different applications and domains such as computational biology and natural language processing. Moreover, many applications in computational linguistics require the computation of similarities over pair of syntactic or semantic trees. In this context, Tree Edit Distance (TED) has been widely used for many years. However, one of the main constraints of this method is to tune the cost of edit operations, which makes it difficult or sometimes very challenging in dealing with complex problems. . | Automatic Cost Estimation for Tree Edit Distance Using Particle Swarm Optimization Yashar Mehdad University of Trento and FBK - Irst Trento Italy mehdad@ Abstract Recently there is a growing interest in working with tree-structured data in different applications and domains such as computational biology and natural language processing. Moreover many applications in computational linguistics require the computation of similarities over pair of syntactic or semantic trees. In this context Tree Edit Distance TED has been widely used for many years. However one of the main constraints of this method is to tune the cost of edit operations which makes it difficult or sometimes very challenging in dealing with complex problems. In this paper we propose an original method to estimate and optimize the operation costs in TED applying the Particle Swarm Optimization algorithm. Our experiments on Recognizing Textual Entailment show the success of this method in automatic estimation rather than manual assignment of edit costs. 1 Introduction Among many tree-based algorithms Tree Edit Distance TED has offered many solutions for various NLP applications such as information retrieval information extraction similarity estimation and textual entailment. Tree edit distance is defined as the minimum costly set of basic operations transforming one tree to another. In common TED approaches use an initial fixed cost for each operation. Generally the initial assigned cost to each edit operation depends on the nature of nodes applications and dataset. For example the probability of deleting a function word from a string is not the same as deleting a symbol in RNA structure. According to this fact tree comparison may be affected by application and dataset. A solution to this problem is assigning the cost to each edit operation empirically or based on the expert knowledge and recommendation. These methods emerge a critical problem when the domain field or application is new and the .
đang nạp các trang xem trước