CB
C.J.H. Bilstra
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
2 records found
1
FATE
Fuzzing for Adversarial examples in Tree Ensembles
Machine learning models are increasing in popularity and are nowadays used in a wide range of critical applications in fields such as Automotive, Aviation and Medical. Among machine learning models, tree ensemble models are a popular choice due to their competitive performance and high degree of explainability. Like most machine learning models they however suffer from adversarial examples: slightly perturbed input for which the model makes an unexpected prediction. These can be seen as bugs in the model and in critical applications such a bug may have high impact. We investigate if fuzzers, a popular and effective tool for identifying bugs in software, can be used for finding bugs (adversarial examples) in tree ensemble models as well.
We introduce FATE, a tool based on grey-box fuzzers that is able to find adversarial examples on a multitude of datasets. Using a custom mutator that leverages domain information as well as model-specific information such as splitting thresholds and dataset-specific information such as training samples, FATE is able to find good adversarial examples: for non-image classification models they are within 1 percent-point difference from examples generated by the state-of-the-art (Zhang et al., 2020). However, the coverage-guidance of grey-box fuzzers actually limits the performance of FATE: running the mutator of FATE as a (1+1) Evolutionary Algorithm makes FATE show competitive performance to the state-of-the-art, even outperforming it on some datasets. ...
We introduce FATE, a tool based on grey-box fuzzers that is able to find adversarial examples on a multitude of datasets. Using a custom mutator that leverages domain information as well as model-specific information such as splitting thresholds and dataset-specific information such as training samples, FATE is able to find good adversarial examples: for non-image classification models they are within 1 percent-point difference from examples generated by the state-of-the-art (Zhang et al., 2020). However, the coverage-guidance of grey-box fuzzers actually limits the performance of FATE: running the mutator of FATE as a (1+1) Evolutionary Algorithm makes FATE show competitive performance to the state-of-the-art, even outperforming it on some datasets. ...
Machine learning models are increasing in popularity and are nowadays used in a wide range of critical applications in fields such as Automotive, Aviation and Medical. Among machine learning models, tree ensemble models are a popular choice due to their competitive performance and high degree of explainability. Like most machine learning models they however suffer from adversarial examples: slightly perturbed input for which the model makes an unexpected prediction. These can be seen as bugs in the model and in critical applications such a bug may have high impact. We investigate if fuzzers, a popular and effective tool for identifying bugs in software, can be used for finding bugs (adversarial examples) in tree ensemble models as well.
We introduce FATE, a tool based on grey-box fuzzers that is able to find adversarial examples on a multitude of datasets. Using a custom mutator that leverages domain information as well as model-specific information such as splitting thresholds and dataset-specific information such as training samples, FATE is able to find good adversarial examples: for non-image classification models they are within 1 percent-point difference from examples generated by the state-of-the-art (Zhang et al., 2020). However, the coverage-guidance of grey-box fuzzers actually limits the performance of FATE: running the mutator of FATE as a (1+1) Evolutionary Algorithm makes FATE show competitive performance to the state-of-the-art, even outperforming it on some datasets.
We introduce FATE, a tool based on grey-box fuzzers that is able to find adversarial examples on a multitude of datasets. Using a custom mutator that leverages domain information as well as model-specific information such as splitting thresholds and dataset-specific information such as training samples, FATE is able to find good adversarial examples: for non-image classification models they are within 1 percent-point difference from examples generated by the state-of-the-art (Zhang et al., 2020). However, the coverage-guidance of grey-box fuzzers actually limits the performance of FATE: running the mutator of FATE as a (1+1) Evolutionary Algorithm makes FATE show competitive performance to the state-of-the-art, even outperforming it on some datasets.
Flowr
Enhancing Dynamic Market Audience Creation
Bachelor thesis
(2017)
-
Ramin Safarpour Erfani, Cas Bilstra, Shane Koppers, Floris List, Niels Schenk, Christoph Lofi, Huijuan Wang, Otto Visser
Omnicom Media Group (OMG) is a company heavily involved in marketing and advertising. Our client is Annalect, a solutions provider that helps the marketers of OMG to make data actionable. OMG has processed cookie data to help their marketers set up advertisement campaigns. They buy this cookie data from a 3rd party. They also manage, however, a vast amount of cookie data themselves, which is currently partly unused. In order for the marketers to use this data, Annalect needs to process and prepare database views for them. To let the marketers, who have no database knowledge, be able to manipulate these views, they create dashboards with 3rd party software called Tableau. Annalect wants us to create an application in which they can set up these dashboards for the marketers so they can manipulate the cookie data and use it for their advertisement campaigns. As a result we have created a web application which supports the workflow of the marketer. After some setup by the people from Annalect, a marketer can sign into our application, choose a dashboard, start working with the cookie data and send the manipulated data off to create an advertisement campaign. All this is done without leaving our application. The data the marketers manipulate in our application is just a small snippet of the complete data set. Since the complete data set contains much more data, it needs to be processed by server clusters paid for by Annalect. This processing is done at night, in order to cut the cost of running the server cluster. After the processing, the result has to be sent to Google DoubleClick Campaign Manager. Furthermore, the ability to use machine learning algorithms was requested by Annalect. This has been implemented through a generic pipeline, which supports multiple machine learning models. A model based on gradient boosting is included as a proof of concept. In order to evaluate the application some tests were done. Different aspects need different tests. Firstly, a usability test was performed with the end users to test the User Interface. Secondly, unit tests were made where it was applicable. Lastly, the machine learning model was evaluated using the recall precision method and K-fold cross validation method.
The application has some aspects which have ethical interest. Managing vast amounts of cookie data needs to be done discretely as personal information can be derived from such data. Having cookie data leak can cause damage to individuals who supplied this data. Besides that, the right to explanation law coming into effect next year will force companies to explain why their computer models made certain decisions or classifications. This has implications for machine learning models used by our application. And finally, the users of the program have to be aware that they are using sensitive data about individuals, which they have to act upon accordingly. ...
The application has some aspects which have ethical interest. Managing vast amounts of cookie data needs to be done discretely as personal information can be derived from such data. Having cookie data leak can cause damage to individuals who supplied this data. Besides that, the right to explanation law coming into effect next year will force companies to explain why their computer models made certain decisions or classifications. This has implications for machine learning models used by our application. And finally, the users of the program have to be aware that they are using sensitive data about individuals, which they have to act upon accordingly. ...
Omnicom Media Group (OMG) is a company heavily involved in marketing and advertising. Our client is Annalect, a solutions provider that helps the marketers of OMG to make data actionable. OMG has processed cookie data to help their marketers set up advertisement campaigns. They buy this cookie data from a 3rd party. They also manage, however, a vast amount of cookie data themselves, which is currently partly unused. In order for the marketers to use this data, Annalect needs to process and prepare database views for them. To let the marketers, who have no database knowledge, be able to manipulate these views, they create dashboards with 3rd party software called Tableau. Annalect wants us to create an application in which they can set up these dashboards for the marketers so they can manipulate the cookie data and use it for their advertisement campaigns. As a result we have created a web application which supports the workflow of the marketer. After some setup by the people from Annalect, a marketer can sign into our application, choose a dashboard, start working with the cookie data and send the manipulated data off to create an advertisement campaign. All this is done without leaving our application. The data the marketers manipulate in our application is just a small snippet of the complete data set. Since the complete data set contains much more data, it needs to be processed by server clusters paid for by Annalect. This processing is done at night, in order to cut the cost of running the server cluster. After the processing, the result has to be sent to Google DoubleClick Campaign Manager. Furthermore, the ability to use machine learning algorithms was requested by Annalect. This has been implemented through a generic pipeline, which supports multiple machine learning models. A model based on gradient boosting is included as a proof of concept. In order to evaluate the application some tests were done. Different aspects need different tests. Firstly, a usability test was performed with the end users to test the User Interface. Secondly, unit tests were made where it was applicable. Lastly, the machine learning model was evaluated using the recall precision method and K-fold cross validation method.
The application has some aspects which have ethical interest. Managing vast amounts of cookie data needs to be done discretely as personal information can be derived from such data. Having cookie data leak can cause damage to individuals who supplied this data. Besides that, the right to explanation law coming into effect next year will force companies to explain why their computer models made certain decisions or classifications. This has implications for machine learning models used by our application. And finally, the users of the program have to be aware that they are using sensitive data about individuals, which they have to act upon accordingly.
The application has some aspects which have ethical interest. Managing vast amounts of cookie data needs to be done discretely as personal information can be derived from such data. Having cookie data leak can cause damage to individuals who supplied this data. Besides that, the right to explanation law coming into effect next year will force companies to explain why their computer models made certain decisions or classifications. This has implications for machine learning models used by our application. And finally, the users of the program have to be aware that they are using sensitive data about individuals, which they have to act upon accordingly.