Circular Image

J. Kober

info

Please Note

93 records found

Teaching Robots Long-Horizon Manipulation Skills

Journal article (2026) - Zlatan Ajanovic, Ravi Prakash, Leandro De Souza Rosa, Jens Kober
Learning from demonstration (LfD) has proved useful for teaching robots complex skills with high sample efficiency. However, teaching long-horizon tasks with multiple skills is challenging as deviations tend to accumulate, the distributional shift becomes more evident, and human teachers become fatigued over time, thereby increasing the likelihood of failure. To address these challenges, we introduce (ST)2 a sequential method for learning long-horizon manipulation tasks that allows users to control teaching flow by specifying keypoints, enabling structured and incremental demonstrations. Using this framework, we study how users respond to two teaching paradigms: 1) a traditional monolithic approach in which users demonstrate the entire task trajectory at once, and 2) a sequential approach, in which the task is segmented and demonstrated step by step. We conducted an extensive user study on the restocking task with 16 participants in a realistic retail store environment, evaluating the user preferences and effectiveness of the methods. A user-level analysis showed superior performance for the sequential approach in most cases (10 users), compared with the monolithic approach (five users), with one tie. Our subjective results indicate that some teachers prefer sequential teaching - as it allows them to teach complicated tasks iteratively - or others prefer teaching in one go due to its simplicity. ...

Bridging Robotics and AI Toward Real-World Applications [From the Guest Editors]

Journal article (2026) - Hao Su, Yunduan Cui, Cosimo Della Santina, Kuan Fang, Jens Kober, Yanan Li, Yunzhu Li, Takamitsu Matsubara, Maria Pozzi
Journal article (2026) - Julian F. Schumann, Johan Engström, Leif Johnson, Matthew O’Kelly, Joao Messias, Jens Kober, Arkady Zgonnikov
Collision avoidance – involving a rapid threat detection and quick execution of the appropriate evasive maneuver – is a critical aspect of driving. However, existing models of human collision avoidance behavior are fragmented, focusing on specific scenarios or only describing certain aspects of the avoidance behavior, such as response times. This paper addresses these gaps by proposing a computational cognitive model of human collision avoidance behavior based on active inference. Active inference provides a unified approach to modeling human behavior: the minimization of free energy. Building on prior active inference work, our model incorporates established cognitive mechanisms such as evidence accumulation to simulate human responses in three distinct collision avoidance scenarios: front-to-rear lead vehicle braking, lateral incursion by an oncoming vehicle, and another vehicle failing to yield at an intersection. We demonstrate that our model explains a wide range of empirical findings on human collision avoidance behavior. Specifically, the model closely reproduces both aggregate results from meta-analyses previously reported in the literature and detailed, scenario-specific effects observed in two recent driving simulator studies, including response timing, maneuver selection, and execution. Our results highlight the potential of active inference as a generalizable framework for understanding and modeling human behavior in complex real-life driving tasks. ...
Planning methods often struggle with computational intractability when solving task-level problems in large-scale environments. This work explores how the commonsense knowledge encoded in Large Language Models (LLMs) can be leveraged to enhance planning techniques for such complex scenarios. Specifically, we propose an approach that uses LLMs to efficiently prune irrelevant components from the planning problem's state space, thereby substantially reducing its complexity. We demonstrate the efficacy of our system through extensive experiments in a household simulation environment as well as real-world validation on a 7-DoF manipulator (video: https://youtu.be/6ro2UOtOQS4). ...

Risk-Aware Contingency Planning with Multi-Modal Predictions

For an autonomous vehicle to operate reliably within real-world traffic scenarios, it is imperative to assess the repercussions of its prospective actions by anticipating the uncertain intentions exhibited by other participants in the traffic environment. Driven by the pronounced multi-modal nature of human driving behavior, this paper presents an approach that leverages Bayesian beliefs over the distribution of potential policies of other road users to construct a novel risk-aware probabilistic motion planning framework. In particular, we propose a novel contingency planner that outputs long-term contingent plans conditioned on multiple possible intents for other actors in the traffic scene. The Bayesian belief is incorporated into the optimization cost function to influence the behavior of the short-term plan based on the likelihood of other agents' policies. Furthermore, a probabilistic risk metric is employed to fine-tune the balance between efficiency and robustness. Through a series of closed-loop safety-critical simulated traffic scenarios shared with human-driven vehicles, we demonstrate the practical efficacy of our proposed approach that can handle multi-vehicle scenarios. ...
Wind turbines are getting larger to increase power capacity. Their longer blades sample a larger area of the spatially and temporally varying turbulent wind field, leading to increased periodic blade load and fatigue damage over time. Individual pitch control (IPC) has proven effective in alleviating these loads by pitching the blades. Conventional IPC fully attenuates the periodic blade loads, which requires excessive pitching, leading to additional stresses on the pitch system. To balance pitch actuation and load alleviation, bounds can be set on the pitch signal (input-constrained IPC), or on the load (output-constrained IPC). While input-constrained IPC has been abundantly researched, little research has focused on output-constrained IPC and on the trade-off when operating between full IPC and no IPC. Therefore, we propose an output-constrained IPC method using an adaptive leaky integrator. The natural frequency of the leaky integrator is adapted on the error between the reference and resultant blade moment. This allows the control scheme to attain every load alleviation level between full and no IPC. Furthermore, in realistic turbulent wind conditions, operating close to full IPC leads to diminishing returns, showing that the proposed controller achieves a superior trade-off between load reduction and actuator effort. ...

GPU-Accelerated Sim2Real Framework with Delay and Dynamics Estimation

Sim2real, the transfer of control policies from simulation to the real world, is crucial for efficiently solving robotic tasks without the risks associated with real-world learning. How-ever, discrepancies between simulated and real environments, especially due to unmodeled dynamics and latencies, significantly impact the performance of these transferred policies. In this paper, we address the challenges of sim2real transfer caused by latency and asynchronous dynamics in real-world robotic systems. Our approach involves developing a novel framework, REX (Robotic Environments with jaX), that uses a graph-based simulation model to incorporate latency effects while optimizing for parallelization on accelerator hard-ware. Our framework simulates the asynchronous, hierarchical nature of real-world systems, while simultaneously estimating system dynamics and delays from real-world data and implementing delay compensation strategies to minimize the sim2real gap. We validate our approach on two real-world systems, demonstrating its effectiveness in improving sim2real performance by accurately modeling both system dynamics and delays. Our results show that the proposed framework supports both accelerated simulation and real-time processing, making it valuable for robot learning. ...

A generative framework for imitation learning from observation

Conference paper (2025) - A.A. Diwan, Julen Urain, Jens Kober, Jan Peters
This paper introduces a new imitation learning framework based on energy-based generative models capable of learning complex, physics-dependent, robot motion policies through state-only expert motion trajectories. Our algorithm, called Noise-conditioned Energy-based Annealed Rewards (NEAR), constructs several perturbed versions of the expert's motion data distribution and learns smooth, and well-defined representations of the data distribution's energy function using denoising score matching. We propose to use these learnt energy functions as reward functions to learn imitation policies via reinforcement learning. We also present a strategy to gradually switch between the learnt energy functions, ensuring that the learnt rewards are always well-defined in the manifold of policy-generated samples. We evaluate our algorithm on complex humanoid tasks such as locomotion and martial arts and compare it with state-only adversarial imitation learning algorithms like Adversarial Motion Priors (AMP). Our framework sidesteps the optimisation challenges of adversarial imitation learning techniques and produces results comparable to AMP in several quantitative metrics across multiple imitation settings. Code and videos available at anishhdiwan.github.io/noise-conditionedenergy-based-annealed-rewards/. ...
Journal article (2025) - M.G. Beuling, Jason Nak, Jens Kober, Jean Pierre T.F. Ho, Jan de Lange, Raoul P. P. P. Grasman, T.C.T. van Riet
Objectives: To develop and validate a questionnaire on dental students' self-efficacy with tooth removal, suitable for measuring the effectiveness of training methods. Methods: To prepare and validate this questionnaire, we used the Association of Medical Education in Europe (AMEE) stepwise guide for developing questionnaires for educational research. In the validation process, our study group conducted two pilot studies, the first for an exploratory factor analysis and the second for a confirmatory factor analysis. In addition, the questionnaire was tested for convergence with the neuroticism subscale of the NEO-Personality Inventory. Results: After an exploratory factor analysis, which used a total of 137 responses on 33 items, 15 items were left for confirmatory factor analysis. A total of 118 responses were available for the confirmatory factor analysis. Model fitness was tested using tests for exact fitness and fit indices such as the goodness of fit index (GFI), root mean square error of approximation (RMSEA) and standardised root mean squared residual (SRMR). An acceptable fit was found for 11 items divided over three factors: ‘self-perceived skill’, ‘tension’ and ‘dedication’. These 11 items did not converge with the neuroticism scale. Conclusion: This study showed the development steps and initial validation of a psychometric instrument, the Amsterdam Self-Efficacy Scale for Tooth Removal (ASES-TR), consisting of 11 items for testing dental students' self-efficacy in performing tooth removal procedures. ...
Journal article (2025) - Giovanni Franzese, Ravi Prakash, Cosimo Della Santina, Jens Kober
Learning from Interactive Demonstrations has revolutionized the way nonexpert humans teach robots. It is enough to kinesthetically move the robot around to teach pick-and-place, dressing, or cleaning policies. However, the main challenge is correctly generalizing to novel situations, e.g., different surfaces to clean or different arm postures to dress. This article proposes a novel task parameterization and generalization to transport the original robot policy, i.e., position, velocity, orientation, and stiffness. Unlike the state of the art, only a set of keypoints is tracked during the demonstration and the execution, e.g., a point cloud of the surface to clean. We then propose to fit a nonlinear transformation that would deform the space and then the original policy using the paired source and target point sets. The use of function approximators like Gaussian Processes allows us to generalize, or transport, the policy from every space location while estimating the uncertainty of the resulting policy due to the limited task keypoints and the reduced number of demonstrations. We compare the algorithm’s performance with state-of-the-art task parameterization alternatives and analyze the effect of different function approximators. We also validated the algorithm on robot manipulation tasks, i.e., different posture arm dressing, different location product reshelving, and different shape surface cleaning. ...

Active Skill-level Data Aggregation for Interactive Imitation Learning

Human teaching effort is a significant bottleneck for the broader applicability of interactive imitation learning. To reduce the number of required queries, existing methods employ active learning to query the human teacher only in uncertain, risky, or novel situations. However, during these queries, the novice’s planned actions are not utilized despite containing valuable information, such as the novice’s capabilities, as well as corresponding uncertainty levels. To this end, we allow the novice to say: “I plan to do this, but I am uncertain.” We introduce the Active Skill-level Data Aggregation (ASkDAgger) framework, which leverages teacher feedback on the novice plan in three key ways: (1) S-Aware Gating (SAG): Adjusts the gating threshold to track sensitivity, specificity, or a minimum success rate; (2) Foresight Interactive Experience Replay (FIER), which recasts valid and relabeled novice action plans into demonstrations; and (3) Prioritized Interactive Experience Replay (PIER), which prioritizes replay based on uncertainty, novice success, and demonstration age. Together, these components balance query frequency with failure incidence, reduce the number of required demonstration annotations, improve generalization, and speed up adaptation to changing domains. We validate the effectiveness of ASkDAgger through language-conditioned manipulation tasks in both simulation and real-world environments. Code, data, and videos are available at https://askdagger.github.io. ...

Guiding Exploration in Reinforcement Learning with Large Language Models

Conference paper (2025) - Runyu Ma, Jelle Luijkx, Zlatan Ajanovic, Jens Kober
In robot manipulation, Reinforcement Learning (RL) often suffers from low sample efficiency and uncertain convergence, especially in large observation and action spaces. Foundation Models (FMs) offer an alternative, demonstrating promise in zero-shot and few-shot settings. However, they can be unreliable due to limited physical and spatial understanding. We introduce ExploRLLM, a method that combines the strengths of both paradigms. In our approach, FMs improve RL convergence by generating policy code and efficient representations, while a residual RL agent compensates for the FMs' limited physical understanding. We show that Explorllm outperforms both policies derived from FMs and RL baselines in table-top manipulation tasks. Additionally, real-world experiments show that the policies exhibit promising zero-shot sim-to-real transfer. Supplementary material is available at https://explorllm.github.io. ...

Leveraging Trajectory Optimization and Behavior Cloning

Journal article (2025) - Edoardo Panichi, Jiatao Ding, Vassil Atanassov, Peiyu Yang, Jens Kober, Wei Pan, Cosimo Della Santina
Quadrupedal jumping has been intensively investigated in recent years. Still, realizing controlled jumping with soft landings remains an open challenge due to the complexity of the jump dynamics and the need to perform complex computations during the short time. This work tackles this challenge by leveraging trajectory optimization and behavior cloning. We generate an optimal jumping motion by utilizing dual-layered coarse-to-refine trajectory optimization. We combine this with a variable impedance control approach to achieve soft landing. Finally, we distill this computationally heavy jumping and landing policy into an efficient neural network via behavior cloning. Extensive simulation experiments demonstrate that, compared to classic model predictive control, the variable impedance control ensures compliance and reduces the stress on the motors during the landing phase. Furthermore, the neural network can reproduce jumping and landing behavior, achieving at least a 97.4% success rate. Hardware experiments confirm the findings, showcasing explosive jumping with soft landings and on-the-fly evaluation of the control actions. ...

Interactive Learning of Robot Situational Awareness From Camera Input

Journal article (2025) - Petr Vanc, Giovanni Franzese, Jan Kristof Behrens, Cosimo Della Santina, Karla Stepanova, Jens Kober, Robert Babuska
Learning from demonstration is a promising approach for teaching robots new skills. However, a central challenge in the execution of acquired skills is the ability to recognize faults and prevent failures. This is essential because demonstrations typically cover only a limited set of scenarios and often only the successful ones. During task execution, unforeseen situations may arise, such as changes in the robot's environment or interaction with human operators. To recognize such situations, this paper focuses on teaching the robot situational awareness by using a camera input and labeling frames as safe or risky. We train a Gaussian Process (GP) regression model fed by a low-dimensional latent space representation of the input images. The model outputs a continuous risk score ranging from zero to one, quantifying the degree of risk at each timestep. This allows for pausing task execution in unsafe situations and directly adding new training data, labeled by the human user. Our experiments on a robotic manipulator show that the proposed method can reliably detect both known and novel faults using only a single example for each new fault. In contrast, a standard multi-layer perceptron (MLP) performs well only on faults it has encountered during training. Our method enables the next generation of cobots to be rapidly deployed with easy-to-set-up, vision-based risk assessment, proactively safeguarding humans and detecting misaligned parts or missing objects before failures occur. ...

Pick and place Ambiguity Resolving by Trustworthy iNteractive leaRning

Several recent works show impressive results in mapping language-based human commands and image scene observations to direct robot executable policies (e.g., pick and place poses). However, these approaches do not consider the uncertainty of the trained policy and simply always execute actions that are suggested by the current policy as the most probable ones. This makes them vulnerable to domain shift and inefficient in the number of required demonstrations. We extend previous works and present the PARTNR algorithm that can detect ambiguities in the trained policy by analyzing multiple modes in the probability distributio of pick and place poses using topological analysis. In this way uncertainty in action can be estimated with single inference (and training single model) instead of using ensemble of models. Additionally, PARTNR employs an adaptive, sensitivity-based, gating function that decides if additional user demonstrations are required. User demonstrations are aggregated to the dataset and used for subsequent training. In this way, the policy can adapt promptly to domain shift and it can minimize the number of required demonstrations for a well-trained policy. The adaptive threshold enables to achieve the user-acceptable level of ambiguity to execute the policy autonomously and in turn, increase the trustworthiness of our system. We demonstrate the performance of PARTNR in a table-top pick and place task. ...

A generative framework for imitation learning from observation

Poster (2025) - A.A. Diwan, Julen Urain, J. Kober, Jan Peters
This paper introduces a new imitation learning framework based on energy-based generative models capable of learning complex, physics-dependent, robot motion policies through state-only expert motion trajectories. Our algorithm, called Noise-conditioned Energy-based Annealed Rewards (NEAR), constructs several perturbed versions of the expert's motion data distribution and learns smooth, and well-defined representations of the data distribution's energy function using denoising score matching. We propose to use these learnt energy functions as reward functions to learn imitation policies via reinforcement learning. We also present a strategy to gradually switch between the learnt energy functions, ensuring that the learnt rewards are always well-defined in the manifold of policy-generated samples. We evaluate our algorithm on complex humanoid tasks such as locomotion and martial arts and compare it with state-only adversarial imitation learning algorithms like Adversarial Motion Priors (AMP). Our framework sidesteps the optimisation challenges of adversarial imitation learning techniques and produces results comparable to AMP in several quantitative metrics across multiple imitation settings. ...
Journal article (2025) - Vassil Atanassov, Jiatao Ding, Jens Kober, Ioannis Havoutis, Cosimo Della Santina
Deep reinforcement learning (DRL) has emerged as a promising solution to mastering explosive and versatile quadrupedal jumping skills. However, current DRL-based frameworks usually rely on pre-existing reference trajectories obtained by capturing animal motions or transferring experience from existing controllers. This work aims to prove that learning dynamic jumping is possible without relying on imitating a reference trajectory by leveraging a curriculum design. Starting from a vertical in-place jump, we generalize the learned policy to forward and diagonal jumps and, finally, we learn to jump across obstacles. Conditioned on the desired landing location, orientation, and obstacle dimensions, the proposed approach yields a wide range of omnidirectional jumping motions in real-world experiments. In particular, we achieve a 90 cm forward jump, exceeding all previous records for similar robots. Additionally, the robot can reliably execute continuous jumping on soft grassy grounds, which is especially remarkable as such conditions were not included in the training stage. ...
Journal article (2025) - J.I.S. Hummel, J. Kober, S.P. Mulders
Individual pitch control (IPC) has been thoroughly researched for its ability to reduce wind turbine blade and tower fatigue loads. Conventional IPC often uses the multiblade coordinate (MBC) transformation and aims for full attenuation of the oscillating loads. However, this also leads to high control effort and increased fatigue damage on the pitch system. Output-constrained IPC uses the minimum actuator effort to drive loads to some reference value instead of fully attenuating them, achieving a trade-off between load reduction and actuator effort. To date, no control method exists that achieves output-constrained IPC using the conventional MBC approach. Furthermore, while multiple constrained IPC approaches have been proposed and analyzed, none of them analyze the full range of operating points between “no IPC” and “full IPC”. This paper presents two output-constrained IPC methods that use the MBC transformation. The first method, ℓ∞ IPC, independently drives the tilt and yaw moment to a tilt and yaw reference, while the second method, ℓ2 IPC, directly targets the magnitude of the combined tilt and yaw load. We furthermore analyze all operating points between no IPC and full IPC. OpenFAST simulations of the IEA 15 MW turbine were run at a wind speed of 15 m s−1. In laminar conditions, ℓ2 IPC is more efficient because it reduces the magnitude of the load directly, while ℓ∞ IPC also uses control effort to change the phase of the blade load in the direction of the load references. To assess the performance in realistic wind conditions, results are averaged over multiple turbulent wind seeds. Both ℓ∞ IPC and ℓ2 IPC have a similar performance, and the operating points between no IPC and full IPC form a nonlinear trade-off. One of the operating points in this trade-off achieves a 50 % load reduction, measured in damage equivalent load, with just 16.4 % of the actuator effort, measured in actuator duty cycle, compared to conventional IPC with the same integrator gain. This shows the potential of output-constrained IPC to facilitate a superior trade-off between load reduction and actuator effort. ...

A Graph-Based Framework for Sim2real Robot Learning

Sim2real, that is, the transfer of learned control policies from simulation to the real world, is an area of growing interest in robotics because of its potential to efficiently handle complex tasks. The sim2real approach faces challenges because of mismatches between simulation and reality. These discrepancies arise from inaccuracies in modeling physical phenomena and asynchronous control, among other factors. To this end, we introduce Engine Agnostic Graph Environments for Robotics (EAGERx), a framework with a unified software pipeline for both real and simulated robot learning. It can support various simulators and aids in integrating state, action, and time scale abstractions to facilitate learning. EAGERx’s integrated delay simulation, domain randomization features, and proposed synchronization algorithm contribute to narrowing the sim2real gap. We demonstrate (in the context of robot learning and beyond) the efficacy of EAGERx in accommodating diverse robotic systems and maintaining consistent simulation behavior. EAGERx is open source, and its code is available at https://eagerx.readthedocs.io ...

E-Go Design, Modeling, and Control

Journal article (2024) - Jiatao Ding, Perry Posthoorn, Vassil Atanassov, Fabio Boekel, Jens Kober, Cosimo Della Santina
To promote the research in compliant quadrupedal locomotion, especially with parallel elasticity, we present Delft E-Go, which is an easily accessible quadruped that combines the Unitree Go1 with open-source mechanical add-ons and control architecture. Implementing this novel system required a combination of technical work and scientific innovation. First, a dedicated parallel spring with adjustable rest length is designed to strengthen each actuated joint. Then, a novel 3-D dual spring-loaded inverted pendulum model is proposed to characterize the compliant locomotion dynamics, decoupling the actuation with parallel compliance. Based on this template model, trajectory optimization is employed to generate optimal explosive motion without requiring reference defined in advance. To complete the system, a torque controller with anticipatory compensation is adopted for motion tracking. Extensive hardware experiments in multiple scenarios, such as trotting across uneven terrains, efficient walking, and explosive pronking, demonstrate the system’s reliability, energy benefits of parallel compliance, and enhanced locomotion capability. Particularly, we demonstrate for the first time the controlled pronking of a quadruped with asymmetric legs. ...