Data Generation Methods for Multi-Label Images of LEGO Bricks

More Info
expand_more

Abstract

Data collection and annotation have proven to be a bottleneck for computer vision applications. When faced with the task of data creation, alternative methods to traditional data collection should be considered, as time and cost may be reduced signif- icantly. We introduce three novel datasets for multi- label classification purposes on LEGO bricks: a traditionally collected dataset, a rendered dataset, and a dataset with pasted cutouts of LEGO bricks. We investigate the accuracy of a ResNet classi- fier tested on real data, but trained on the different datasets. This research seeks to provide both in- sight into future dataset creation of LEGO bricks, as well as act as an advisor for general multi-label dataset creation. Our findings indicate that the tra- ditionally collected dataset is prone to overfitting due to speedups used during collection, with 90% accuracy during training but 19% during testing. Furthermore, synthetic data techniques are appli- cable to multi-label LEGO classification but need improvement, with accuracy ranging from 34% to 45%.

Files