Object Affordance Detection for Mobile Manipulation in Retail Environments

More Info


Modern retail stores are increasingly implementing automated systems with the aim of assisting both the customer and employee. Much research is conducted on robotic applications within such environments; one of these applications concerns deploying an autonomous mobile manipulator to assist humans in retail stores. Such a robot requires the ability to cope with unexpected environmental changes, like encountering obstructions in retail store aisles. Generally, the first step in the pipeline of an autonomous mobile robot is focused on gathering and processing environmental information using perception sensors like cameras and LiDARs. Subsequently, this information is used to plan the mobile robot's path, which is typically done without allowing interaction with the environment. For object detection techniques, objects are first localized, after which they are classified towards their object type category. Another approach is to classify objects towards their functional categories, which are commonly referred to as affordances. Research in the field of affordance classification mostly focuses on kitchen, garden and working tools due to the availability of affordance datasets for these objects. However, no work on affordances in retail store environments has been conducted to this date. More specifically, there is no dataset publicly available that allows mobile manipulators to identify how to interact with retail store related objects in such environments. This work has investigated the adaptation of an instance segmentation network to localize and classify objects on floors of retail stores into affordance classes. These affordance classes relate to the functional capabilities of mobile manipulators, like graspable or pushable. To achieve this, an affordance dataset consisting of retail store related objects is essential. To overcome the scarcity in this data, a novel dataset consisting of 3237 images with pixel-level affordance annotations was successfully created using an automated data generation approach. This work has shown that such an approach can be used to minimize human labour in terms of acquiring annotated data drastically. A state-of-the-art instance segmentation network was trained using this synthetically generated data and was tested on both synthetic and real image data. The evaluation revealed that the use of synthetic data for training allows for inference on real image data, yet this may compromise the localization and especially the classification performance. Further, the observation of objects that are assigned to both affordance classes has introduced the aspect of the subjectivity of affordances.