tailieunhanh - Text classification based on support vector machine
The development of the Internet has increased the need for daily online information storage. Finding the correct information that we are interested in takes a lot of time, so the use of techniques for organizing and processing text data are needed. These techniques are called text classification or text categorization. There are many methods of text classification, but for this paper we study and apply the Support Vector Machine (SVM) method and compare its effect with the Naïve Bayes probability method. In addition, before implementing text classification, we performed preprocessing steps on the training set by extracting keywords with dimensional reduction techniques to reduce the time needed in the classification process. | DALAT UNIVERSITY JOURNAL OF SCIENCE Volume 9 Issue 2 2019 3-19 TEXT CLASSIFICATION BASED ON SUPPORT VECTOR MACHINE Le Thi Minh Nguyena aThe Faculty of Information Technology Hochiminh City University of Foreign Languages - Information Technology Hochiminh City Vietnam Corresponding author Email nguyenltm@ Article history Received December 15th 2018 Received in revised form January 29th 2019 Accepted February 14th 2019 Abstract The development of the Internet has increased the need for daily online information storage. Finding the correct information that we are interested in takes a lot of time so the use of techniques for organizing and processing text data are needed. These techniques are called text classification or text categorization. There are many methods of text classification but for this paper we study and apply the Support Vector Machine SVM method and compare its effect with the Naive Bayes probability method. In addition before implementing text classification we performed preprocessing steps on the training set by extracting keywords with dimensional reduction techniques to reduce the time needed in the classification process. Keywords Feature vector Kernal Naive Bayes Support Vector Machine Text classification. DOI http 2019 Article type peer-reviewed Full-length research article Copyright 2019 The author s . Licensing This article is licensed under a CC BY-NC-ND 3 DALAT UNIVERSITY JOURNAL OF SCIENCE NATUAL SCIENCES AND TECHNOLOGY PHÂN LỚP VĂN BẢN DỰA TRÊN SUPPORT VECTOR MACHINE Lê Thị Minh Nguyệna aKhoa Công nghệ Thông tin Trường Đại học Ngoại ngữ - Tin học TP. Hồ Chí Minh TP. Hồ Chí Minh Việt Nam Tácgiả liên hệ Email nguyenltm@ Lịch sử bài báo Nhận ngày 15 tháng 12 năm 2018 Chỉnh sửa ngày 29 tháng 01 năm 2019 Chấp nhận đăng ngày 14 tháng 02 năm 2019 Tóm tắt Sự phát triển của Internet làm cho thông tin lưu trữ trực tuyến hàng ngày gia tăng nhanh chóng. Do vậy để tìm đúng thông
đang nạp các trang xem trước