Đang chuẩn bị liên kết để tải về tài liệu:
Graduation Thesis Computer Science: Finding the semantic similarity in Vietnamese

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

Our thesis shows the quality of semantic vector representation with random projection and Hyperspace Analogue to Language model under about the researching on Vietnamese. The main goal is how to find semantic similarity or to study synonyms in Vietnamese. We are also interested in the stability of our approach that uses Random Indexing and HAL to represent semantic of words or documents. We build a system to find the synonyms in Vietnamese called Semantic Similarity Finding System. In particular, we also evaluate synonyms resulted from our system. | For the first future work, our Semantic Similarity Finding System provides geometric words-by–words co-occurrence matrix to compute context vector. We count co-occurrences symmetrically in both directions within the window (three words on the left and three words on the right of target words); this is one case in the computing co-occurrence matrix. Thus, some given words can be difficult to find synonym, especially for pronoun. Seeing our experiment 2, the pronoun has not good results output result, the reason can be pronoun often be located at the first or last sentence position. In additional, adjectives in Vietnamese is always placed after nouns in sentences, it is different from English. Therefore, to gain more accurate synonyms for adjective, we need to build the left directional words-by-words co-occurrence matrix. We will build a new Mode in our system for computing co-occurrence matrix in which rows contain left-context co-occurrences, and columns contain right-context co-occurrences. Therefore, we will collect the more exact synonyms on Vietnamese vocabulary.