tailieunhanh - manning schuetze statisticalnlp phần 4

Danh từ cụm từ Peter và Ngài trong câu () và hành khách khác và người đàn ông trong câu () đề cập đến cùng một người. Việc giải quyết các mối quan hệ anaphoric là quan trọng đối với khai thác thông tin. Trong khai thác thông tin, | 184 5 Collocations and whose exact and unambiguous meaning or connotation cannot be derived directly from the meaning or connotation of its components. Most of the examples we have presented in this chapter also assumed adjacency of words. But in most linguistically oriented research a phrase can be a collocation even if it is not consecutive as in the example knock. . door . The following criteria are typical of linguistic treatments of collocations see for example Benson 1989 and Brundage et al. 1992 non-compositionality being the main one we have relied on here. Non-compositionality. The meaning of a collocation is not a straightforward composition of the meanings of its parts. Either the meaning is completely different from the free combination as in the case of idioms like kick the bucket or there is a connotation or added element of meaning that cannot be predicted from the parts. For example white wine white hair and white woman all refer to slightly different colors so we can regard them as collocations. Non-substitutability. We cannot substitute other words for for the components of a collocation even if in context they have the same meaning. For example we can t say yellow wine instead of white wine even though yellow is as good a description of the color of white wine as white is it is kind of a yellowish white . Non-modifiability. Many collocations cannot be freely modified with additional lexical material or through grammatical transformations. This is especially true for frozen expressions like idioms. For example we can t modify frog in to get a frog in one s throat into to get an ugly frog in one s throat although usually nouns like frog can be modified by adjectives like ugly. Similarly going from singular to plural can make an idiom ill-formed for example in people as poor as church mice. A nice way to test whether a combination is a collocation is to translate it into another language. If we cannot translate the combination word by word then that