LCDB 1.0

An Extensive Learning Curves Database for Classification Tasks

Conference Paper (2023)
Author(s)

Felix Mohr (Universidad de La Sabana, Chia)

Tom Viering (TU Delft - Pattern Recognition and Bioinformatics)

Marco Loog (TU Delft - Pattern Recognition and Bioinformatics, University of Copenhagen)

Jan N. van Rijn (Universiteit Leiden)

Research Group
Pattern Recognition and Bioinformatics
Copyright
© 2023 Felix Mohr, T.J. Viering, M. Loog, Jan N. van Rijn
DOI related publication
https://doi.org/10.1007/978-3-031-26419-1_1
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 Felix Mohr, T.J. Viering, M. Loog, Jan N. van Rijn
Research Group
Pattern Recognition and Bioinformatics
Bibliographical Note
Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public. @en
Pages (from-to)
3-19
ISBN (print)
9783031264184
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The use of learning curves for decision making in supervised machine learning is standard practice, yet understanding of their behavior is rather limited. To facilitate a deepening of our knowledge, we introduce the Learning Curve Database (LCDB), which contains empirical learning curves of 20 classification algorithms on 246 datasets. One of the LCDB’s unique strength is that it contains all (probabilistic) predictions, which allows for building learning curves of arbitrary metrics. Moreover, it unifies the properties of similar high quality databases in that it (i) defines clean splits between training, validation, and test data, (ii) provides training times, and (iii) provides an API for convenient access (pip install lcdb). We demonstrate the utility of LCDB by analyzing some learning curve phenomena, such as convexity, monotonicity, peaking, and curve shapes. Improving our understanding of these matters is essential for efficient use of learning curves for model selection, speeding up model training, and to determine the value of more training data.

Files

978_3_031_26419_1_1.pdf
(pdf | 1.02 Mb)
- Embargo expired in 18-09-2023
License info not available