Symbolic method for deriving policy in reinforcement learning