Solving ML with ML: Effectiveness of the Metropolis-Hastings algorithm for synthesizing Machine Learning Pipelines

None, None

Solving ML with ML: Effectiveness of the Metropolis-Hastings algorithm for synthesizing Machine Learning Pipelines

Bachelor Thesis (2023)

Author(s)

D.O. Sheremet (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

S. Dumancic – Mentor (TU Delft - Algorithmics)

T.R. Hinnerichs – Graduation committee member (TU Delft - Algorithmics)

David M.J. Tax – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Machine Learning Program Synthesis AutoML Pipeline Synthesis Automated Machine Learning

To reference this document use:

https://resolver.tudelft.nl/uuid:1d0c2a54-6e34-4727-8858-7c6a78968d22

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

25-06-2023

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In AutoML, the search space of possible pipelines is often large and multidimensional. This makes it very important to use an efficient search algorithm. We measure the effectiveness of the Metropolis-Hastings algorithm (M-H) in a pipeline synthesis framework, when the search space is described by a context-free grammar. We also compare the performance of the M-H algorithm to other search algorithms. While AutoML frameworks use many different search algorithms, and comparisons between AutoML frameworks exist, this is the first paper that compares the performance of different search algorithms in the context of pipeline synthesis under equal conditions. We found that M-H is slightly outperformed by BFS2, the simplest possible search algorithm. We conclude that the datasets we use for evaluating the algorithms are too simple to meaningfully compare the performance of different search algorithms. We also conclude that for simple datasets, simple search algorithms work best.

Files

CSE3000_Denys_Sheremet_Final_R... (pdf)

(pdf | 0.357 Mb)

License info not available