MM

M.D.I. Museur

info

Please Note

1 records found

Recent work has shown that offline reinforcement learning (RL) does not generalize well to new environments compared to behavioral cloning (BC). We propose WSAC-N, an ensemble model of soft actor-critics with weights to de-emphasize actions with high variance. We compare the zero-shot generalization abilities of WSAC-N with the baseline BC in a four-room maze-like environment, testing on unseen tasks. Our findings indicate that WSAC-N has worse zero-shot generalization compared to BC, aligning with previous work. Additionally, we investigate the impact of dataset characteristics on generalization, finding that dataset size has a negligible impact, while the quality of trajectories generally has a positive effect. These results are consistent with prior research. ...