Population-based Active Learning for Black-Box Regression

More Info
expand_more

Abstract

Many applications employ models to represent real-life environments efficiently. To allow these models to be realistic it is commonly fitted using a dataset containing labeled samples. When obtaining a label for a sample from the environment is expensive, it is key that the dataset contains only those samples that aid in providing a realistic model the most. Active Learning (AL) provides searching strategies for selecting these samples based on different heuristics: diversity, informativeness, and representativeness. This thesis focuses specifically on population-based AL for regression, where both sample and output space are infinite. Its goal is to create a performant, efficient, extensible, and generally applicable selection strategy for this setting. To allow for the latter a black-box model, through which its strategy can be used with virtually any model. The strategy itself is modular, allowing for extensions. This strategy iteratively concentrates on an interesting subregion within the sampling space through three modular steps: discretizing the sample space, providing fitness scores to this discretization, and restricting the sample space based on these fitness scores further. This strategy is applied to both a scientific polynomial setting, as well as a car-following setting. Experiments show that this approach outperforms randomly selecting a sample in both cases, especially when a long labeling time is considered.