Safe Reinforcement Learning by Shielding for Autonomous Vehicles

More Info
expand_more

Abstract

In the past few years, there has been much research in the field of Autonomous Vehicles (AV). If AVs are implemented in our daily lives, this could have many advantages. Before this can happen, safe driver models need to be designed which control the AVs. One technique that is suitable to create these models is Reinforcement Learning (RL). A problem here is that an RL agent usually needs to execute random actions during training, which is unsafe when driving an AV. Two shields are proposed to solve this problem: a Safety Checking Shield (SCS) and a Safe Initial Policy Shield (SIPS). The SCS checks whether an action is safe by predicting the future state after taking that action and checking whether that future state is safe. The SIPS checks whether an action is safe by comparing it to a safe action from a Safe Initial Policy. Based on the safety of the current state and this action, a safe range of actions is created in which the chosen action must fall. Furthermore, two shield-based learning techniques are proposed which are part of the RL algorithm and allow the agent to learn to avoid proposing actions that would be overruled by a shield. For the first method, experiences are fabricated, and for the second method, an alternative loss function is adopted. In the CARLA driving simulator, two scenarios were created to test the systems. In the first scenario, the agent needs to learn to drive straight, and in the second scenario, it needs to learn to not hit other vehicles on a straight road. The two shields are built around a Double Deep Q-Network (DDQN) and compared to it. It is shown that both shielding systems have zero collisions during training and execution, while having a similar or even better performance in terms of efficiency in comparison to the baseline DDQN. Furthermore, it is shown that both shield-based learning techniques effectively enable the agent to learn to not propose unsafe actions.