Dv
D.J. van der Werf
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
2 records found
1
One Step Ahead
A weakly-supervised approach to training robust machine learning models for transaction monitoring
Master thesis
(2021)
-
D.J. van der Werf, J. Yang, G.J.P.M. Houben, L. Cavalcante Siebert, A.M.A. Balayn, Ali El Hassouni
In recent years financial fraud has seen substantial growth due to the advent of electronic financial services opening many doors for fraudsters. Consequently, the industry of fraud detection has seen a significant growth in scale, but moves slowly in comparison to the ever-changing nature of fraudulent behavior. As the monetary losses associated with financial fraud continue to grow, so does the need for efficient automated decision making systems. Simple decision making rules are often still the industry standard and only show decent results in the short-term, as reverse-engineering such rules is an easy task for smart fraudsters. Supervised learning systems as automated fraud detectors have shown promising results across the field, but are plagued by challenges uniquely prevalent in the field. Disproportional class imbalance in fraudulent transactions, as well as fraudsters continually adopting new schemes make training robust and generally applicable machine learning models an arduous task. This work introduces a novel machine learning pipeline, which makes use of carefully selected synthetic samples of this minority class to augment the training dataset of the supervised model. Synthetic samples representing fraudulent transactions are filtered based on a novel technique to quantify their expected performance as an adversarial example, using both data-driven and human-expert-driven techniques. By providing the supervised model with high-quality synthetic adversarial examples, we aim to improve its generalizability to never-seen-before fraudulent behavior and, in turn, improve its robustness to the volatile nature of financial fraud. Our results show that weakly-supervised models trained on our augmented datasets are able to detect 7% more fraudulent transactions compared to a baseline model trained on the standard dataset, at the cost of a 1% increase in false positives. Our calculations further show that applying this system could lead to a decrease of 1/6 in monetary losses incurred by financial fraud.
...
In recent years financial fraud has seen substantial growth due to the advent of electronic financial services opening many doors for fraudsters. Consequently, the industry of fraud detection has seen a significant growth in scale, but moves slowly in comparison to the ever-changing nature of fraudulent behavior. As the monetary losses associated with financial fraud continue to grow, so does the need for efficient automated decision making systems. Simple decision making rules are often still the industry standard and only show decent results in the short-term, as reverse-engineering such rules is an easy task for smart fraudsters. Supervised learning systems as automated fraud detectors have shown promising results across the field, but are plagued by challenges uniquely prevalent in the field. Disproportional class imbalance in fraudulent transactions, as well as fraudsters continually adopting new schemes make training robust and generally applicable machine learning models an arduous task. This work introduces a novel machine learning pipeline, which makes use of carefully selected synthetic samples of this minority class to augment the training dataset of the supervised model. Synthetic samples representing fraudulent transactions are filtered based on a novel technique to quantify their expected performance as an adversarial example, using both data-driven and human-expert-driven techniques. By providing the supervised model with high-quality synthetic adversarial examples, we aim to improve its generalizability to never-seen-before fraudulent behavior and, in turn, improve its robustness to the volatile nature of financial fraud. Our results show that weakly-supervised models trained on our augmented datasets are able to detect 7% more fraudulent transactions compared to a baseline model trained on the standard dataset, at the cost of a 1% increase in false positives. Our calculations further show that applying this system could lead to a decrease of 1/6 in monetary losses incurred by financial fraud.
Bachelor thesis
(2018)
-
Sytze Andringa, Job Zoon, Daan van der Werf, Matthijs Spaan, Wessel Van, Huijuan Wang
One of the greatest challenges in marketing is measuring the return of investment of a marketing campaign and translating that into a strategy. Companies spend a lot of money on marketing without knowing how eective certain marketing campaigns are. To solve this problem for bunq, we will be using machine learning to create a marketing attribution system which outputs the optimal parameters for advertisements, based on data from all previous bunq advertisements. This tool can be used by the marketing department of bunq to increase its eciency. The marketing attribution project consists of three parts: the machine learning model itself, the input data of the machine learning model and the system through which people can get output of the model. The machine learning model is created by a data scientist at bunq. The model uses supervised learning, a method that uses a set of annotated training data as a supervisor for learning patterns. We specically make use of deep learning models that use regression to nd either the expected amount of clicks or the cost per acquisition of an advertisement. The results of these models are presented as a JSON le containing the best n advertisement options and their features. The input data to train the machine learning model was created by us. One component of the input data are the so-called touchpoints from Adjust. Adjust is an advertisement tracking company, which helps bunq with gathering data about all online encounters people had with bunq, like clicks on bunq advertisements or visits to the bunq website. The Adjust data gives the machine learning model information about how often an advertisement has been seen or clicked on, but it does not give information about how ecient an advertisement was in terms of the gained users. To solve this, we wrote an algorithm that anonymously matches the Adjust data to user data in the bunq database, based on IP-address and timestamps. The more links an advertisement has with users, the more ecient it is since it has been part of a process that convinced many users to become a bunq user. With this input data the machine learning model can be trained. The second part of the project is creating a connection to the machine learning model in such a way that the marketing department can use it. We created a python server that accepts calls from the bunq backend and sends the calls to the model, which is written in Java. It will then pass on the response of the model back to the bunq backend. In the python server, we use a bayesian technique to determine the best inputs for the marketing attribution machine learning model, to nally get the best possible parameters for a certain advertisement. All code in the backend is written in PHP and .json in a very clear Model View Controller structure, with strict bunq coding guidelines. Testing is done with PHPUnit tests.
...
...
One of the greatest challenges in marketing is measuring the return of investment of a marketing campaign and translating that into a strategy. Companies spend a lot of money on marketing without knowing how eective certain marketing campaigns are. To solve this problem for bunq, we will be using machine learning to create a marketing attribution system which outputs the optimal parameters for advertisements, based on data from all previous bunq advertisements. This tool can be used by the marketing department of bunq to increase its eciency. The marketing attribution project consists of three parts: the machine learning model itself, the input data of the machine learning model and the system through which people can get output of the model. The machine learning model is created by a data scientist at bunq. The model uses supervised learning, a method that uses a set of annotated training data as a supervisor for learning patterns. We specically make use of deep learning models that use regression to nd either the expected amount of clicks or the cost per acquisition of an advertisement. The results of these models are presented as a JSON le containing the best n advertisement options and their features. The input data to train the machine learning model was created by us. One component of the input data are the so-called touchpoints from Adjust. Adjust is an advertisement tracking company, which helps bunq with gathering data about all online encounters people had with bunq, like clicks on bunq advertisements or visits to the bunq website. The Adjust data gives the machine learning model information about how often an advertisement has been seen or clicked on, but it does not give information about how ecient an advertisement was in terms of the gained users. To solve this, we wrote an algorithm that anonymously matches the Adjust data to user data in the bunq database, based on IP-address and timestamps. The more links an advertisement has with users, the more ecient it is since it has been part of a process that convinced many users to become a bunq user. With this input data the machine learning model can be trained. The second part of the project is creating a connection to the machine learning model in such a way that the marketing department can use it. We created a python server that accepts calls from the bunq backend and sends the calls to the model, which is written in Java. It will then pass on the response of the model back to the bunq backend. In the python server, we use a bayesian technique to determine the best inputs for the marketing attribution machine learning model, to nally get the best possible parameters for a certain advertisement. All code in the backend is written in PHP and .json in a very clear Model View Controller structure, with strict bunq coding guidelines. Testing is done with PHPUnit tests.