Reinforcement Learning (RL) focuses on maximizing the returns (discounted rewards) throughout the episodes, one of the main challenges when using it is that it is inadequate for safety-critical tasks due to the possibility of transitioning into critical states while exploring. Sa
...
Reinforcement Learning (RL) focuses on maximizing the returns (discounted rewards) throughout the episodes, one of the main challenges when using it is that it is inadequate for safety-critical tasks due to the possibility of transitioning into critical states while exploring. Safe Reinforcement Learning (SafeRL) is a subset of RL that focuses on achieving safe exploration during the learning process and, thus, allowing it to be used in safety critical tasks. This research focuses on expanding already existing SafeRL algorithms through a combination of two previously developed safety metrics into a novel one. Furthermore, this research also uses an ellipsoid-based bounding model to replace the interval analysis bounding model. To validate this combination of the metrics, two different examples are used to test the performance of the various algorithms and compare their relative survivability and computational efficiency: a quadrotor navigation task and an elevator control task. Results show that the novel combined metric (ProxOp) outperforms one of the metrics in the quadrotor navigation task and the other one in the elevator control task. Overall, the combined metric is a better option for usage when there is no \textit{a priori} knowledge on how any of the metrics will behave. The ellipsoidal bounding model is tested using the elevator control task and is shown to have comparable performance when used in combination with the proximity and the ProxOp metrics but shows a significant degradation of performance when used solely with the operative metric. The ellipsoidal bounding model is also preferable for use in tasks where there is no knowledge on what are the bounds of the system model, as the ellipsoidal bounding model estimates the model error initially through the use of Gaussian processes. This research, thus, presents a viable novel safety metric as well as an alternative bounding model that can be used with it for applications of RL where safety is important.