Time series predictions for bank account balances

Bachelor thesis (2016)

Authors

B.L.L. Kreynen

C. Olieman

F.A. van Doorn

M. Spanoghe

Contributors

T.E.P.M.F. Abeel (mentor)

Department

Computer Science () (TU Delft)

To reference this document use:

http://resolver.tudelft.nl/uuid:9b1d0211-6180-418b-a341-a65c3f0d7e7b

More Info

expand_more

Published Date

24-06-2016

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Department

Computer Science

Abstract

For our bachelor project we have been using machine learning to predict account balances for a large Dutch bank holding company. The company’s main interest is the integration of machine learning techniques in their systems. To enable this we have been asked to develop a product to predict account balances for the clients of associated banks. With the clients interest in machine learning in mind we have developed a framework enabling the user to implement different machine learning and non machine learning models. The framework makes it easy to compare the implemented models using different error measures, parameters of inputs and lets the user visualize the results easily. In this framework we have implemented our own models for the account prediction. To compare our models we started with implementing a baseline, next to this baseline we have implemented two non machine learning and one machine learning model. The data we used to train and validate our models has been derived from the clients data warehouse. We have cut the accounts on different criteria like activity and the period they have been with the bank. After that we have normalized the data to be able to better interpreted and process it. The machine learning techniques we want to implement require a lot of training examples, this made us decide implement a clustering model as well to create more data to train our models on. Eventually the clustering did not give us the expected results and we decided not to use it for our final model. To give our client a suited recommendation about the machine learning libraries to use on their systems, we have implemented the same clustering method with two different libraries. After this comparison we were able to recommend our client the Scikit-learn library over the more low level Tensorflow library. From this point on we used the Scikit-learn library as well for the implementation of SVM model. For the regression we implemented the L-1 prediction, OLS method and an SVM. Compared to the baseline, our SVM model gave the best results, however the results of the L-1 prediction closely followed the results of our SVM model. After a better comparison we have discovered that in some cases the SVM model makes a prediction is almost exactly the same as the L-1 prediction, one the other hand, various other predictions are not based on this pattern at all. We therefore assume that after tweaking the SVM more, it will preform better and show significantly better results than the L-1 prediction. For now we did not have time to tweak our SVM, but we have tried different inputs and parameters. As a future improvement these parameters can be tested in more detail and it would be interesting to take a closer look at different militarization methods and error measures. In conclusion we were able to test machine learning techniques with the client’s data by implementing a well working SMV model for account balance prediction. This model works on the clients systems and is validated on real client data. Furthermore we provided our client with a framework that allows them to easily implement machine learning and non machine learning models. This framework provides the user with interfaces to build models, standard data operations and error measures. This allows the user to quickly research many different con- figurations. We used this framework ourselves during this project to compare our machine learning and non machine learning models.

Files

Final report - Time serie pred... (pdf)

(pdf | 1.52 Mb)