Towards a Linear-Data Monotone Wrapper Algorithm For Machine Learning Algorithms

None, None

Towards a Linear-Data Monotone Wrapper Algorithm For Machine Learning Algorithms

Master Thesis (2023)

Author(s)

B.H. Kam (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.C. van Gemert – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Marco Loog – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Tom Julian Viering – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Machine Learning Learning Curves Wrapper Montonciity

To reference this document use:

https://resolver.tudelft.nl/uuid:3daa0709-1647-4a2f-9362-b2ac89aa7ee6

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

07-06-2023

Awarding Institution

Delft University of Technology

Programme

Computer Science | Artificial Intelligence

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Machine learning algorithms (learners) are typically expected to produce monotone learning curves, meaning that their performance improves as the size of the training dataset increases. However, it is important to note that this behavior is not universally observed. Recently monotonicity of learning curves has gained renewed attention, as several authors have proposed ’wrapper’ algorithms; algorithms that attempt at filtering the hypotheses produced by a learner to turn them into a monotone learner, even if the learner itself is not monotone. Such wrappers use part of the training data as validation data, and each newly produced hypothesis is evaluated using this validation data. However, with each new hypothesis, the validation data grows in size exponentially. As such the wrapper is data-hungry, using up to 85% of the training data as validation data in some cases. This paper investigates what happens when a linearly growing validation sample is used instead. Is it enough to retain monotonicity? We proof that selecting the best performing hypothesis from a finite set of hypotheses, based on a validation sample that grows linearly, results in a monotone learning curve. However, when introducing a new hypothesis with each increase in the validation sample size, it has been observed that this selection process does not demonstrate monotonic behavior. The authors of this paper hope that this work provides key insight into how to choose from a set of hypotheses in a monotone way, and that the work may be a stepping stone for a fully functioning linear-data monotone wrapper algorithm.

Files

Report_Corrected.pdf

(pdf | 1.07 Mb)

License info not available