MM
M.D.I. Museur
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
1 records found
1
Recent work has shown that offline reinforcement learning (RL) does not generalize well to new environments compared to behavioral cloning (BC). We propose WSAC-N, an ensemble model of soft actor-critics with weights to de-emphasize actions with high variance. We compare the zero-shot generalization abilities of WSAC-N with the baseline BC in a four-room maze-like environment, testing on unseen tasks. Our findings indicate that WSAC-N has worse zero-shot generalization compared to BC, aligning with previous work. Additionally, we investigate the impact of dataset characteristics on generalization, finding that dataset size has a negligible impact, while the quality of trajectories generally has a positive effect. These results are consistent with prior research.
...
Recent work has shown that offline reinforcement learning (RL) does not generalize well to new environments compared to behavioral cloning (BC). We propose WSAC-N, an ensemble model of soft actor-critics with weights to de-emphasize actions with high variance. We compare the zero-shot generalization abilities of WSAC-N with the baseline BC in a four-room maze-like environment, testing on unseen tasks. Our findings indicate that WSAC-N has worse zero-shot generalization compared to BC, aligning with previous work. Additionally, we investigate the impact of dataset characteristics on generalization, finding that dataset size has a negligible impact, while the quality of trajectories generally has a positive effect. These results are consistent with prior research.