Transient non-stationarity and generalisation in deep reinforcement learning