tailieunhanh - Báo cáo khoa học: "Using adaptor grammars to identify synergies in the unsupervised acquisition of linguistic structure"
Adaptor grammars (Johnson et al., 2007b) are a non-parametric Bayesian extension of Probabilistic Context-Free Grammars (PCFGs) which in effect learn the probabilities of entire subtrees. In practice, this means that an adaptor grammar learns the structures useful for generating the training data as well as their probabilities. We present several different adaptor grammars that learn to segment phonemic input into words by modeling different linguistic properties of the input. | Using adaptor grammars to identify synergies in the unsupervised acquisition of linguistic structure Mark Johnson Brown University Mark_Johnson@ Abstract Adaptor grammars Johnson et al. 2007b are a non-parametric Bayesian extension of Probabilistic Context-Free Grammars PCFGs which in effect learn the probabilities of entire subtrees. In practice this means that an adaptor grammar learns the structures useful for generating the training data as well as their probabilities. We present several different adaptor grammars that learn to segment phonemic input into words by modeling different linguistic properties of the input. One of the advantages of a grammar-based framework is that it is easy to combine grammars and we use this ability to compare models that capture different kinds of linguistic structure. We show that incorporating both unsupervised syllabification and collocation-finding into the adaptor grammar significantly improves unsupervised word-segmentation accuracy over that achieved by adaptor grammars that model only one of these linguistic phenomena. 1 Introduction How humans acquire language is arguably the central issue in the scientific study of language. Human language is richly structured but it is still hotly debated as to whether this structure can be learnt or whether it must be innately specified. Computational linguistics can contribute to this debate by identifying which aspects of language can potentially be learnt from the input available to a child. Here we try to identify linguistic properties that convey information useful for learning to segment streams of phonemes into words. We show that simultaneously learning syllable structure and collocations improves word segmentation accuracy compared to models that learn these independently. This suggests that there might be a synergistic interaction in learning several aspects of linguistic structure simultaneously as compared to learning each kind of linguistic structure .
đang nạp các trang xem trước