tailieunhanh - Application of machine learning to fill in the missing monitoring data of air quality

In this paper, three machine learning models have been applied to predict and fill in the missing monitoring data of air quality for Gia Lam and Nha Trang stations in Hanoi and Khanh Hoa respectively, including Autoregressive Moving Average (ARMA), Artificial Neural Network (ANN), and Support Vector Regression (SVR). Two air pollutants being NO2 and PM10 were selected for this study. | Vietnam Journal of Science and Technology 56 (2C) (2018) 104-110 APPLICATION OF MACHINE LEARNING TO FILL IN THE MISSING MONITORING DATA OF AIR QUALITY Mac Duy Hung1, 2, *, Nghiem Trung Dung1, Hoang Xuan Co3 1 2 Hanoi University of Science and Technology, 1 Dai Co Viet road, Ha Noi, Viet Nam Thai Nguyen University of Technology, 3/2 road, Tich Luong ward, Thai Nguyen, Viet Nam 3 VNU University of Science, 334 Nguyen Trai road, Ha Noi, Viet Nam * Email: macdh@ Received: 10 May 2018; Accepted for publication: 21 August 2018 ABSTRACT In this paper, three machine learning models have been applied to predict and fill in the missing monitoring data of air quality for Gia Lam and Nha Trang stations in Hanoi and Khanh Hoa respectively, including Autoregressive Moving Average (ARMA), Artificial Neural Network (ANN), and Support Vector Regression (SVR). Two air pollutants being NO2 and PM10 were selected for this study. The experimental results showed that the performance of all three studied models is better than that of some traditional approaches, including Multiple Linear Regression (LR) and Spline interpolation. Besides that, ARMA, ANN and SVR can capture the fluctuation of concentrations of the selected pollutants. These results indicated that the machine learning is a feasible approach to deal with the missing of data which is one of the biggest problems of air quality monitoring stations in Viet Nam. Keywords: air quality, ANN, ARMA, SVR, missing data. 1. INTRODUCTION Monitoring and modeling of air quality is of ultimate significance for understanding the trend and characteristics of air pollutants. For understanding and simulating the fluctuation of an air pollutant, it is required to have the dataset of air quality which is not only long enough in time and reliable but also time-serially completion of observations. However, the continuity of time-series measurements is normally plagued with different factors including malfunction of the equipment, .

TỪ KHÓA LIÊN QUAN