tailieunhanh - Báo cáo khoa học: "Finding Parts in Very Large Corpora"

Hearst states that she tried to apply this strategy to the part-of relation, but failed. We comment later on We present a method for extracting parts of objects the differences in our approach that we believe were from wholes (. "speedometer" from "car"). Given most important to our comparative success. a very large corpus our method finds part words with Looking more widely still, there is an ever55% accuracy for the top 50 words as ranked by the growing literature on the use of statistical/corpussystem. . | Finding Parts in Very Large Corpora Matthew Berland Eugene Charniak mb ec@cs. brown edu Department of Computer Sãence Brown University Box 1910 Providence R1 02912 Abstract We present a method for extracting parts of objects from wholes . speedometer from car . Given a very large corpus our method finds part words with 55 accuracy for the top 50 words as ranked by the system. The part list could be scanned by an end-user and added to an existing ontology such as WordNet or used as a part of a rough semantic lexicon. 1 Introduction We present a method of extracting parts of objects from wholes . speedometer from car . To be more precise given a single word denoting some entity that has recognizable parts the system finds and rank-orders other words that may denote parts of the entity in question. Thus the relation found is strictly speaking between words a relation Miller 1 calls meronymy. In this paper we use the more colloquial part-of terminology. We produce words with 55 accuracy for the top 50 words ranked by the system given a very large corpus. Lacking an objective definition of the part-of relation we use the majority judgment of five human subjects to decide which proposed parts are correct. The program s output could be scanned by an enduser and added to an existing ontology . Word-Net or used as a part of a rough semantic lexicon. To the best of our knowledge there is no published work on automatically finding parts from unlabeled corpora. Casting our nets wider the work most similar to what we present here is that by Hearst 2 on acquisition of hyponyms isa relations . In that paper Hearst a finds lexical correlates to the hyponym relations by looking in text for cases where known hyponyms appear in proximity . in the construction NP NP and NP other NN as in boats cars and other vehicles b tests the proposed patterns for validity and c uses them to extract relations from a corpus. In this paper we apply much the same methodology to the part-of