Automated bin-picking is a difficult task that requires solving multiple robotic vision problems including object detection and grasp proposal generation. Current methods use deep learning to approach each of the vision problems of bin-picking separately with the main focus on ge
...
Automated bin-picking is a difficult task that requires solving multiple robotic vision problems including object detection and grasp proposal generation. Current methods use deep learning to approach each of the vision problems of bin-picking separately with the main focus on generating the grasp proposals. For grasp proposal generation, neural networks are taught to detect grasp locations by directly proposing or scoring locations. The problem with this method is that it binds the network to a specific decision making process that is encoded into the dataset. We propose a novel architecture we call map-grasp that does not propose grasp locations but instead produces property maps describing measurable, physical object properties while simultaneously detecting and classifying items in an image. We devised a method to automatically generate a dataset containing these property maps from a point cloud, from which we can then determine a grasping location based on custom criteria. We train our network on a training set containing 1845 images and then test it on a physical system where we attempt over 700 grasps on 66 different items. We compare our performance on a current state-of-the-art analytical method and show that our network can outperform it, achieving an 82.16% first-try success rate and a 89.73% total clear rate.