Circular Image

M.M. Celikok

info

Please Note

3 records found

Journal article (2025) - Hüseyin Aydin, Kevin Godin-Dubois, Libio Goncalvez Braz, Floris Den Hengst, Kim Baraka, Mustafa Mert Çelikok, Andreas Sauter, Shihan Wang, Frans A. Oliehoek
We present SHARPIE (Shared Human-AI Reinforcement Learning Platform for Interactive Experiments), a generic framework to support experiments with RL agents and humans. It consists of a versatile wrapper for RL environments and algorithm libraries, a participant-facing web interface, logging utilities, and deployment on popular cloud and participant recruitment platforms. It empowers researchers to study a wide variety of research questions related to the interaction between humans and RL agents and aims to standardize the field of study on RL in human contexts. ...
Conference paper (2025) - Saptarashmi Bandyopadhyay, Mustafa Mert Çelikok, Robert Loftin
Artificially intelligent agents deployed in the real world must be able to reliably cooperate with humans (as well as other, heterogeneous AI agents). To provide formal guarantees of successful cooperation, we must make some assumptions about how these partner agents could plausibly behave. Realistic assumptions must account for the fact that other agents may be just as adaptable as our agent is. In this work, we consider the setting where an AI agent must cooperate with members of some target population of agents in a finitely repeated two-player general-sum game, where individual utilities are private. Two natural assumptions in this setting are 1) all agents in the target population are individually rational learners, and 2) when paired with another member of the population, with high-probability the agents will achieve the same expected utility as they would under some Pareto-efficient equilibrium strategy of the underlying stage game. Our theoretical results show that these assumptions alone are insufficient to select an AI strategy that achieves zero-shot cooperation with members of the target population. We therefore consider the problem of learning such a cooperation strategy using observations of members of the target population interacting with one another, and provide upper bounds on the sample complexity of learning such a cooperation strategy. Our main result shows that, under the above assumptions, these bounds can be much stronger than those arising from a “naive” reduction of the problem to one of imitation learning. ...
Conference paper (2022) - Mustafa Mert Çelikok, Frans A. Oliehoek, Samuel Kaski
Centaurs are half-human, half-AI decision-makers where the AI's goal is to complement the human. To do so, the AI must be able to recognize the goals and constraints of the human and have the means to help them. We present a novel formulation of the interaction between the human and the AI as a sequential game where the agents are modelled using Bayesian best-response models. We show that in this case the AI's problem of helping bounded-rational humans make better decisions reduces to a Bayes-adaptive POMDP. In our simulated experiments, we consider an instantiation of our framework for humans who are subjectively optimistic about the AI's future behaviour. Our results show that when equipped with a model of the human, the AI can infer the human's bounds and nudge them towards better decisions. We discuss ways in which the machine can learn to improve upon its own limitations as well with the help of the human. We identify a novel trade-off for centaurs in partially observable tasks: for the AI's actions to be acceptable to the human, the machine must make sure their beliefs are sufficiently aligned, but aligning beliefs might be costly. We present a preliminary theoretical analysis of this trade-off and its dependence on task structure. ...