tailieunhanh - Lecture Notes in Computer Science- P110
Lecture Notes in Computer Science- P110:This year, we received about 170 submissions to ICWL 2008. There were a total of 52 full papers, representing an acceptance rate of about 30%, plus one invited paper accepted for inclusion in this LNCS proceedings. The authors of these accepted papers | 534 . Chang et al. if it occurs frequently within text but infrequently in the larger collection. The formula is shown as follows Wij Tfij log2 N n where 2 Wij weight of term Tj in document Di Tfij frequency of term Tj in document Di N number of documents in collection n number of documents where Tj occurs at least once The documents of the formula must be modified as a individual package. The N of the collection in our case is learning content repository and the parameter n in this formula requires knowledge of all words within the collection that holds the text material of interest. For calculating each word s importance we need to construct a dictionary that contains the information of how frequently it occurs across course packages in learning content repository. Fig. 3. Construct a dictionary of weighted words Figure 3 shows each step of constructing the dictionary of weight words from our learning content repository. It begins from the content parser which fetches learning courses from the repository and extracts all the words from each course unless the frequent stopped words such as is are and etc. Then each uniquely extracted word will be tagged by a counter module with a number and keeps track of the number of courses where the word occurred. Once the counting is complete the words that occurred less than a chosen threshold value across all the courses are eliminated. The value is required to be tuned because it depends on the size of the repository. It would conserve too many insignificant words if the value is too large. On the contrary it is probable to remove rare words that may quite important and have the potential to become keywords. The remaining words are passed through a spell checker and finally words that have the same grammatical stem are combined into single dictionary entries. For example adaptive and adapted would share an entry in the dictionary. Accordingly the size of the dictionary will continually shrink. A Semiautomatic Content .
đang nạp các trang xem trước