Continuously learning where to go next from observing pedestrians

More Info
expand_more

Abstract

Socially compliant robot navigation in pedestrian environments remains challenging owing to uncertainty in human behavior and varying pedestrian preferences in different social contexts. Local optimization planners like Model Predictive Control can incorporate collision avoidance constraints, but they can only lead to socially compliant trajectories if the cost function embeds information about the desired social behavior. The same holds for Reinforcement Learning, where a sophisticated reward function needs to be defined. However, formalizing social behavior through a reward or cost function is difficult due to the complex nature of pedestrian behavior. Imitation learning allows for inferring the desired behavior by learning from human demonstrations, making them suitable for learning socially compliant navigation policies but without any safety considerations. In this work, we propose to learn a socially compliant navigation policy directly by observing surrounding pedestrians’ trajectories from a commonly available detection and tracking pipeline and combine it with a local optimization planner to enhance safety. A Subgoal Recommender policy is developed to guide the local optimization planner to generate socially compliant trajectories by providing intermediate subgoals. To adapt the policy to changing social contexts without forgetting previously learned information, we train the Subgoal Recommender in a Continual Learning setup exploiting new pedestrian data.
We demonstrate in simulation that our method can learn a policy that has similar performance metrics as that of the observed trajectories with 95% confidence estimated from a t-test, resulting in a lesser number of collisions. Further, the policy can adapt to different social preferences exhibited by pedestrians, while being able to remember the learned behavior in a previously encountered social context. Furthermore, we show that our proposed method can learn navigation policies from actual pedestrian data recorded using the onboard perception pipeline of a Clearpath Jackal robot.