Learning object-object affordances for placement-position prediction in an item packing task

More Info
expand_more

Abstract

Grocery e-commerce has been rapidly increasing in recent years, posing a new challenge for retailers as groceries, unlike other goods, have a limited shelf life. Thus, customers expect their orders to arrive quickly and undamaged. Currently, most processes between a customer placing an order and the delivery are performed manually in a warehouse making it labor-intensive and time consuming. The use of autonomous robots in these environments can help improve operational efficiency and productivity while at the same time reducing labor costs and accidents.

One specific process important for maintaining the desired quality of the customer's ordered items and fast delivery is item packing. For an autonomous robot to pack items independently, it must learn and predict possible placement positions that are both geometrically feasible and semantically plausible so that items maintain their desired quality. Retail-related environments are heterogeneous environments containing many different items. Thus, instead of the robot knowing what the item is, it is more beneficial to know how an item can interact with a scene. This entails that robots must learn interactions between items and the scene, which are called affordances, to place items in plausible positions. These so-called object-object affordances describe how an object can interact with another object.

In this thesis, object-object affordances are learned to predict where to place items inside a box in an item packing task. With the use of a simulator called SAPIEN, item packing is simulated, and large-scale interaction data is generated. A model is then trained to predict a placement position by giving as input a complete pointcloud of the item to be placed and a partial pointcloud of the scene. An item packing pipeline is then built that can pack items using the trained model. Several item packing experiments are performed to pack single items and a list of items. The results show that the model successfully learns the semantic relationships between objects resulting in packing items in stable and plausible positions. Several other experiments are performed to evaluate whether the model is generalizable to novel items and real pointcloud data. Results show that the model successfully predicts where novel items should be placed. The model can also adapt to real pointcloud data of a box and predict where items should be placed in a real box containing real items.