Assessing perceived Humanness of Artificial Intelligence in Chess

A Turing Test experiment using Think Aloud and Eye Tracking methods

More Info
expand_more

Abstract

With the advancement of Artificial Intelligence leading to increasingly human-like outputs, assessing a machine’s ability to exhibit human-like intelligence has become more essential than ever. This study aims to investigate how human-like chess players perceive four conditions: one human opponent and three different types of algorithms. One of these algorithms, Maia, has been trained on human data and aims to play the most human-like move. In a custom-designed experiment similar to a Turing test, chess players faced off against Maia, Stockfish and a human without knowing their opponent’s nature. After each game, the chess player assessed how human-like the moves of the opponent were and estimated whether they played against an engine or a human opponent. During the game, participants were asked to think aloud about their next move and react towards the moves of the opponent. Additionally, the gaze of the player was captured with the SR EyeLink Portable Duo at 1000Hz, with the goal of finding differences within the player’s gaze while participants tried to discover the nature of their opponent. Results from the experiment revealed that, based on responses to a subjective questionnaire, the perceived humanness of Maia is statistically similar to a human and different from the other two chess engines. From the analysis of the voice recordings, categories of sentences were identified that could suggest recognition of the opponent, specifically: "expected", "unexpected", "human-like" and "engine-like". From the eye-tracking results, the average fixation duration and pupil diameter changes following the opponent’s move were compared for each condition, but showed no statistical differences between conditions. In summary, Maia was perceived more human-like compared with other chess engines. However, differences in underlying cognitive processes on how the human perceived this difference in a Turing Test experiment were not identified.