SS
S. Schwartz
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
1 records found
1
Uncertainty Based Exploration in Reinforcement Learning
Analyzing the Robustness of Bayesian Deep Q-Networks
Bayesian Deep Q-Networks (BDQN) have demonstrated superior exploration capabilities and performance in complex environments such as Atari games, yet their behavior in other simpler settings and their sensitivity to hyperparameters remain understudied. This work evaluates BDQN in both contextual bandit and reinforcement learning tasks, compares it against the standard ϵ-greedy exploration strategy and analyzes its hyperparameter sensitivity. Our results indicate that BDQN outperforms ϵ-greedy DQN in exploration-heavy environments, particularly Deep Sea with sparse rewards, but performs comparably in simpler tasks where exploration is less critical. Sensitivity analysis reveals that the forgetting factor (α) plays a central role in modulating
exploration, while other hyperparameters such as batch size also impact performance to varying degrees. These findings suggest BDQN is a promising strategy for complex tasks requiring persistent exploration, though it introduces additional tuning complexity. ...
exploration, while other hyperparameters such as batch size also impact performance to varying degrees. These findings suggest BDQN is a promising strategy for complex tasks requiring persistent exploration, though it introduces additional tuning complexity. ...
Bayesian Deep Q-Networks (BDQN) have demonstrated superior exploration capabilities and performance in complex environments such as Atari games, yet their behavior in other simpler settings and their sensitivity to hyperparameters remain understudied. This work evaluates BDQN in both contextual bandit and reinforcement learning tasks, compares it against the standard ϵ-greedy exploration strategy and analyzes its hyperparameter sensitivity. Our results indicate that BDQN outperforms ϵ-greedy DQN in exploration-heavy environments, particularly Deep Sea with sparse rewards, but performs comparably in simpler tasks where exploration is less critical. Sensitivity analysis reveals that the forgetting factor (α) plays a central role in modulating
exploration, while other hyperparameters such as batch size also impact performance to varying degrees. These findings suggest BDQN is a promising strategy for complex tasks requiring persistent exploration, though it introduces additional tuning complexity.
exploration, while other hyperparameters such as batch size also impact performance to varying degrees. These findings suggest BDQN is a promising strategy for complex tasks requiring persistent exploration, though it introduces additional tuning complexity.