Clustered K nearest neighbor algorithm for daily inflow forecasting

More Info
expand_more

Abstract

Instance based learning (IBL) algorithms are a common choice among data driven algorithms for inflow forecasting. They are based on the similarity principle and prediction is made by the finite number of similar neighbors. In this sense, the similarity of a query instance is estimated according to the closeness of its feature vector with those of data available in calibration data. As the selected attributes in the feature vector are determined overall on calibration data, there may be some data points whose outputs do not follow the considered attributes. In fact, output values of these inconsistent data points may be a function of some other attributes which were not considered. Therefore, for some query instances, the inconsistent points may be appeared as the neighbors while they may not really be neighbor to the query instance. They can deteriorate forecasting results especially if they are very close to the query instance with the current similarity definition. In this study a clustered K nearest neighbor (CKNN) algorithm is introduced which can capture these inconsistent data points. Similar to the inconsistent data points, CKNN can be also robust against noisy data. The proposed algorithm was shown to be effective for a synthetic linear data set corrupted by noise. In addition, the utility of the algorithm was demonstrated for daily inflow forecasting of the Karoon1 reservoir located in Iran.

Files