<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>
Off-policy evaluation has some key problems with one of them being the “curse of horizon”. With recent breakthroughs [1] [2], new estimators have emerged that utilise importance sampling of the individual state-action pairs and reward rather than over the whole trajectory. With the difference between behaviour and target policy, the state-visitation mismatch occurs. This paper is interested in answering the question how the degree of state-visitation mismatch affects the overall target policy performance. The approach is to calculate the state-visitation mismatch with the KL divergence, which consists of the state-visitation distribution of the behaviour policy and the distribution correction ratio of the DICE estimator. The state-visitation mismatch can be quantified in way. Furthermore, the effect on the target policy performance is quantified by the MSE between the estimated, empirical cumulative reward and the estimated reward by the DICE estimator. By analysing the KL divergence and MSE value, one may argue that the state-visitation mismatch does impact the performance of the target policy but further research needs to be conducted.
...
Off-policy evaluation has some key problems with one of them being the “curse of horizon”. With recent breakthroughs [1] [2], new estimators have emerged that utilise importance sampling of the individual state-action pairs and reward rather than over the whole trajectory. With the difference between behaviour and target policy, the state-visitation mismatch occurs. This paper is interested in answering the question how the degree of state-visitation mismatch affects the overall target policy performance. The approach is to calculate the state-visitation mismatch with the KL divergence, which consists of the state-visitation distribution of the behaviour policy and the distribution correction ratio of the DICE estimator. The state-visitation mismatch can be quantified in way. Furthermore, the effect on the target policy performance is quantified by the MSE between the estimated, empirical cumulative reward and the estimated reward by the DICE estimator. By analysing the KL divergence and MSE value, one may argue that the state-visitation mismatch does impact the performance of the target policy but further research needs to be conducted.
Journal article(2021)
-
Lorenzo De Santis, Matthew E. Trusheim, Kevin C. Chen, Dirk R. Englund
Quantum emitters in diamond are leading optically accessible solid-state qubits. Among these, Group IV-vacancy defect centers have attracted great interest as coherent and stable optical interfaces to long-lived spin states. Theory indicates that their inversion symmetry provides first-order insensitivity to stray electric fields, a common limitation for optical coherence in any host material. Here we experimentally quantify this electric field dependence via an external electric field applied to individual tin-vacancy (SnV) centers in diamond. These measurements reveal that the permanent electric dipole moment and polarizability are at least 4 orders of magnitude smaller than for the diamond nitrogen vacancy (NV) centers, representing the first direct measurement of the inversion symmetry protection of a Group IV defect in diamond. Moreover, we show that by modulating the electric-field-induced dipole we can use the SnV as a nanoscale probe of local electric field noise, and we employ this technique to highlight the effect of spectral diffusion on the SnV.
...
Quantum emitters in diamond are leading optically accessible solid-state qubits. Among these, Group IV-vacancy defect centers have attracted great interest as coherent and stable optical interfaces to long-lived spin states. Theory indicates that their inversion symmetry provides first-order insensitivity to stray electric fields, a common limitation for optical coherence in any host material. Here we experimentally quantify this electric field dependence via an external electric field applied to individual tin-vacancy (SnV) centers in diamond. These measurements reveal that the permanent electric dipole moment and polarizability are at least 4 orders of magnitude smaller than for the diamond nitrogen vacancy (NV) centers, representing the first direct measurement of the inversion symmetry protection of a Group IV defect in diamond. Moreover, we show that by modulating the electric-field-induced dipole we can use the SnV as a nanoscale probe of local electric field noise, and we employ this technique to highlight the effect of spectral diffusion on the SnV.