tailieunhanh - Báo cáo khoa học: "Discovering Phonotactic Finite-State Automata by Genetic Search"

This paper presents a genetic algorithm based approach to the automatic discovery of finitestate a u t o m a t a (FSAs) from positive data. FSAs are commonly used in computational phonology, but - given the limited learnability of FSAs from arbitrary language subsets - are usually constructed manually. The approach presented here offers a practical automatic method that helps reduce the cost of manual FSA construction. | Discovering Phonotactic Finite-State Automata by Genetic Search Anja Belz School of Cognitive and Computing Sciences University of Sussex Brighton BN1 9QH UK email Abstract This paper presents a genetic algorithm based approach to the automatic discovery of finite-state automata FSAs from positive data. FSAs are commonly used in computational phonology but - given the limited learnability of FSAs from arbitrary language subsets - are usually constructed manually. The approach presented here offers a practical automatic method that helps reduce the cost of manual FSA construction. 1 Background Finite-state techniques have always been central to computational phonology. Definite FSAs are commonly used to encode phonotactic constraints in a variety of NLP contexts. Such FSAs are usually constructed in a timeconsuming manual process. Following notational convention an FSA is a quintuple 5 z Ớ So F in which s is a finite nonempty set of states 7 is a finite nonempty alphabet s is the state transition function So is the initial state and F is a nonempty set of final states in s. The language L accepted by the finite automaton A denoted 7 A is i 5 s0 r 6 -F . Generally the problem considered here is that of identifying a language L from a fixed finite sample D 7 D- where 7 c L and D n L 0 D may be empty . If D is empty and D is structurally complete with regard to L the problem is not complex and there exist a number of algorithms based on first constructing a canonical automaton from the data set which is then reduced or merged in various ways. If D is an arbitrary strict subset of L the problem is less clearly defined. Since any finite sample is consistent with an infinite number of languages L cannot be identified uniquely from D . .the best we can hope to do is to infer a grammar that will describe the strings in D and predict other strings that in some sense are of the same nature as those contained in D . Fu and Booth 1986 p. 345 To constrain

crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.