CL
C. Li
info
Please Note
<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
1 records found
1
Autonomous robotic exploration must balance fast mapgrowth with robustness under uncertainty. Among frontier-based methods, information-theoretic variants (e.g., Shannon/Rényi) are fast and reliable. This thesis integrates Behavioral Entropy (BE)—which models human-like risk perception—into a planner-in-theloop deep reinforcement learning(DQN) framework that learns the BE risk parameter 𝛼 online.
The policy selects 𝛼 from a discrete action set, uses occlusion-aware visibility to compute BE-based information gain, and delegates motion to a classical A* planner; reward shaping couples coverage/entropy reduction with motion cost. Action-usage analysis reveals an adaptive risk schedule—aggressive early, conservative mid-episode, selectively aggressive late—enabling faster cleanup of residual area. Overall, a reinforcement-learning method with risk-attitude tuning yields a robust, planner-compatible explorer. ...
The policy selects 𝛼 from a discrete action set, uses occlusion-aware visibility to compute BE-based information gain, and delegates motion to a classical A* planner; reward shaping couples coverage/entropy reduction with motion cost. Action-usage analysis reveals an adaptive risk schedule—aggressive early, conservative mid-episode, selectively aggressive late—enabling faster cleanup of residual area. Overall, a reinforcement-learning method with risk-attitude tuning yields a robust, planner-compatible explorer. ...
Autonomous robotic exploration must balance fast mapgrowth with robustness under uncertainty. Among frontier-based methods, information-theoretic variants (e.g., Shannon/Rényi) are fast and reliable. This thesis integrates Behavioral Entropy (BE)—which models human-like risk perception—into a planner-in-theloop deep reinforcement learning(DQN) framework that learns the BE risk parameter 𝛼 online.
The policy selects 𝛼 from a discrete action set, uses occlusion-aware visibility to compute BE-based information gain, and delegates motion to a classical A* planner; reward shaping couples coverage/entropy reduction with motion cost. Action-usage analysis reveals an adaptive risk schedule—aggressive early, conservative mid-episode, selectively aggressive late—enabling faster cleanup of residual area. Overall, a reinforcement-learning method with risk-attitude tuning yields a robust, planner-compatible explorer.
The policy selects 𝛼 from a discrete action set, uses occlusion-aware visibility to compute BE-based information gain, and delegates motion to a classical A* planner; reward shaping couples coverage/entropy reduction with motion cost. Action-usage analysis reveals an adaptive risk schedule—aggressive early, conservative mid-episode, selectively aggressive late—enabling faster cleanup of residual area. Overall, a reinforcement-learning method with risk-attitude tuning yields a robust, planner-compatible explorer.