R. Babuska
Please Note
111 records found
1
ILeSiA
Interactive Learning of Robot Situational Awareness From Camera Input
Learning from demonstration is a promising approach for teaching robots new skills. However, a central challenge in the execution of acquired skills is the ability to recognize faults and prevent failures. This is essential because demonstrations typically cover only a limited set of scenarios and often only the successful ones. During task execution, unforeseen situations may arise, such as changes in the robot's environment or interaction with human operators. To recognize such situations, this paper focuses on teaching the robot situational awareness by using a camera input and labeling frames as safe or risky. We train a Gaussian Process (GP) regression model fed by a low-dimensional latent space representation of the input images. The model outputs a continuous risk score ranging from zero to one, quantifying the degree of risk at each timestep. This allows for pausing task execution in unsafe situations and directly adding new training data, labeled by the human user. Our experiments on a robotic manipulator show that the proposed method can reliably detect both known and novel faults using only a single example for each new fault. In contrast, a standard multi-layer perceptron (MLP) performs well only on faults it has encountered during training. Our method enables the next generation of cobots to be rapidly deployed with easy-to-set-up, vision-based risk assessment, proactively safeguarding humans and detecting misaligned parts or missing objects before failures occur.
REX
GPU-Accelerated Sim2Real Framework with Delay and Dynamics Estimation
Sim2real, the transfer of control policies from simulation to the real world, is crucial for efficiently solving robotic tasks without the risks associated with real-world learning. How-ever, discrepancies between simulated and real environments, especially due to unmodeled dynamics and latencies, significantly impact the performance of these transferred policies. In this paper, we address the challenges of sim2real transfer caused by latency and asynchronous dynamics in real-world robotic systems. Our approach involves developing a novel framework, REX (Robotic Environments with jaX), that uses a graph-based simulation model to incorporate latency effects while optimizing for parallelization on accelerator hard-ware. Our framework simulates the asynchronous, hierarchical nature of real-world systems, while simultaneously estimating system dynamics and delays from real-world data and implementing delay compensation strategies to minimize the sim2real gap. We validate our approach on two real-world systems, demonstrating its effectiveness in improving sim2real performance by accurately modeling both system dynamics and delays. Our results show that the proposed framework supports both accelerated simulation and real-time processing, making it valuable for robot learning.
SmartAlert
Machine learning-based patient-ventilator asynchrony detection system in intensive care units
Background and Objective: Patient-ventilator asynchronies (PVA) are associated with ventilator-induced lung injury and increased mortality. Current detection methods rely on static thresholds, extensive preprocessing, or proprietary ventilator data. This study aimed to develop and validate a fully online, real-time system that detects and classifies PVAs directly from ventilator screen data while alerting clinicians based on severity. Methods: The SmartAlert system was developed using ventilator screen recordings from ICU patients. It extracts pressure and flow waveforms from video recordings, converts them into time-series data, and employs deep neural networks to classify asynchronies and assign alarm levels from no urgency to most urgent. A dataset of 381,280 double-breath units was independently annotated by two expert intensivists. Two deep learning models were trained: one for alarm prediction and another for asynchrony classification (ineffective triggering, double cycling, high inspiratory effort, no asynchrony). Performance was evaluated using accuracy, sensitivity, specificity, and AUC-ROC, compared to expert consensus. Results: SmartAlert demonstrated strong performance for alarm level prediction (overall accuracy: 83.8 %, weighted AUC-ROC: 0.943 [95 % CI: 0.941–0.945]) and PVA classification (weighted accuracy: 89.3 %, weighted AUC-ROC: 0.951 [95 % CI: 0.950–0.953]). It showed high specificity for urgent alarms (99.9 % for level 3) and PVA types (98.5 % for ineffective triggering, 96.9 % for double cycling, 94.8 % for high inspiratory effort). Conclusions: We developed and internally validated SmartAlert, an automated system that detects PVAs, classifies severity, and alerts clinicians in real time. Its potential to reduce alarm fatigue, optimize ventilator settings, and improve patient outcomes remains to be tested in clinical trials.
Engine Agnostic Graph Environments for Robotics (EAGERx)
A Graph-Based Framework for Sim2real Robot Learning
Sim2real, that is, the transfer of learned control policies from simulation to the real world, is an area of growing interest in robotics because of its potential to efficiently handle complex tasks. The sim2real approach faces challenges because of mismatches between simulation and reality. These discrepancies arise from inaccuracies in modeling physical phenomena and asynchronous control, among other factors. To this end, we introduce Engine Agnostic Graph Environments for Robotics (EAGERx), a framework with a unified software pipeline for both real and simulated robot learning. It can support various simulators and aids in integrating state, action, and time scale abstractions to facilitate learning. EAGERx’s integrated delay simulation, domain randomization features, and proposed synchronization algorithm contribute to narrowing the sim2real gap. We demonstrate (in the context of robot learning and beyond) the efficacy of EAGERx in accommodating diverse robotic systems and maintaining consistent simulation behavior. EAGERx is open source, and its code is available at https://eagerx.readthedocs.io
Symbolic regression is a technique that can automatically derive analytic models from data. Traditionally, symbolic regression has been implemented primarily through genetic programming that evolves populations of candidate solutions sampled by genetic operators, crossover and mutation. More recently, neural networks have been employed to learn the entire analytical model, i.e., its structure and coefficients, using regularized gradient-based optimization. Although this approach tunes the model's coefficients better, it is prone to premature convergence to suboptimal model structures. Here, we propose a neuro-evolutionary symbolic regression method that combines the strengths of evolutionary-based search for optimal neural network (NN) topologies with gradient-based tuning of the network's parameters. Due to the inherent high computational demand of evolutionary algorithms, it is not feasible to learn the parameters of every candidate NN topology to the full convergence. Thus, our method employs a memory-based strategy and population perturbations to enhance exploitation and reduce the risk of being trapped in suboptimal NNs. In this way, each NN topology can be trained using only a short sequence of back-propagation iterations. The proposed method was experimentally evaluated on three real-world test problems and has been shown to outperform other NN-based approaches regarding the quality of the models obtained.
To efficiently deploy robotic systems in society, mobile robots must move autonomously and safely through complex environments. Nonlinear model predictive control (MPC) methods provide a natural way to find a dynamically feasible trajectory through the environment without colliding with nearby obstacles. However, the limited computation power available on typical embedded robotic systems, such as quadrotors, poses a challenge to running MPC in real time, including its most expensive tasks: constraints generation and optimization. To address this problem, we propose a novel hierarchical MPC scheme that consists of a planning and a tracking layer. The planner constructs a trajectory with a long prediction horizon at a slow rate, while the tracker ensures trajectory tracking at a relatively fast rate. We prove that the proposed framework avoids collisions and is recursively feasible. Furthermore, we demonstrate its effectiveness in simulations and lab experiments with a quadrotor that needs to reach a goal position in a complex static environment. The code is efficiently implemented on the quadrotor's embedded computer to ensure real-time feasibility. Compared to a state-of-the-art single-layer MPC formulation, this allows us to increase the planning horizon by a factor of 5, which results in significantly better performance.
SymFormer
End-to-End Symbolic Regression Using Transformer-Based Architecture
Many real-world systems can be naturally described by mathematical formulas. The task of automatically constructing formulas to fit observed data is called symbolic regression. Evolutionary methods such as genetic programming have been commonly used to solve symbolic regression tasks, but they have significant drawbacks, such as high computational complexity. Recently, neural networks have been applied to symbolic regression, among which the transformer-based methods seem to be most promising. After training a transformer on a large number of formulas, the actual inference, i.e., finding a formula for new, unseen data, is very fast (in the order of seconds). This is considerably faster than state-of-the-art evolutionary methods. The main drawback of transformers is that they generate formulas without numerical constants, which have to be optimized separately, yielding suboptimal results. We propose a transformer-based approach called SymFormer, which predicts the formula by outputting the symbols and the constants simultaneously. This helps to generate formulas that fit the data more accurately. In addition, the constants provided by SymFormer serve as a good starting point for subsequent tuning via gradient descent to further improve the model accuracy. We show on several benchmarks that SymFormer outperforms state-of-the-art methods while having faster inference.
Formulating the dynamics of continuously deformable objects and other mechanical systems analytically from first principles is an exceedingly challenging task, often impractical in real-world scenarios. What makes this challenge even harder to solve is that, usually, the object has not been observed previously, and the only information that we can get from it is a stream of RGB camera data. In this study, we explore the use of deep learning techniques to solve this nonlinear identification problem. We specifically focus on extracting dynamic models of simple deformable objects from the high-dimensional sensor input coming from an RGB camera. We investigate a two-stage approach to achieve this goal. First, we train a variational autoencoder to extract an extremely low-dimensional representation of the object configuration. Then, we learn a dynamic model that predicts the evolution of these latent space variables. The proposed architecture can accurately predict the object's state up to one second into the future.
As large language models (LLMs) permeate more and more applications, an assessment of their associated security risks becomes increasingly necessary. The potential for exploitation by malicious actors, ranging from disinformation to data breaches and reputation damage, is substantial. This paper addresses a gap in current research by specifically focusing on security risks posed by LLMs within the prompt-based interaction scheme, which extends beyond the widely covered ethical and societal implications. Our work proposes a taxonomy of security risks along the user-model communication pipeline and categorizes the attacks by target and attack type alongside the commonly used confidentiality, integrity, and availability (CIA) triad. The taxonomy is reinforced with specific attack examples to showcase the real-world impact of these risks. Through this taxonomy, we aim to inform the development of robust and secure LLM applications, enhancing their safety and trustworthiness.
Currently, truss tomato weighing and packaging require significant manual work. The main obstacle to automation lies in the difficulty of developing a reliable robotic grasping system for already harvested trusses. We propose a method to grasp trusses that are stacked in a crate with considerable clutter, which is how they are commonly stored and transported after harvest. The method consists of a deep learning-based vision system to first identify the individual trusses in the crate and then determine a suitable grasping location on the stem. To this end, we have introduced a grasp pose ranking algorithm with online learning capabilities. After selecting the most promising grasp pose, the robot executes a pinch grasp without needing touch sensors or geometric models. Lab experiments with a robotic manipulator equipped with an eye-in-hand RGB-D camera showed a 100% clearance rate when tasked to pick all trusses from a pile. 93% of the trusses were successfully grasped on the first try, while the remaining 7% required more attempts.
Imitrob
Imitation Learning Dataset for Training and Evaluating 6D Object Pose Estimators
This letter introduces a dataset for training and evaluating methods for 6D pose estimation of hand-held tools in task demonstrations captured by a standard RGB camera. Despite the significant progress of 6D pose estimation methods, their performance is usually limited for heavily occluded objects, which is a common case in imitation learning, where the object is typically partially occluded by the manipulating hand. Currently, there is a lack of datasets that would enable the development of robust 6D pose estimation methods for these conditions. To overcome this problem, we collect a new dataset (Imitrob) aimed at 6D pose estimation in imitation learning and other applications where a human holds a tool and performs a task. The dataset contains image sequences of nine different tools and twelve manipulation tasks with two camera viewpoints, four human subjects, and left/right hand. Each image is accompanied by an accurate ground truth measurement of the 6D object pose obtained by the HTC Vive motion tracking device. The use of the dataset is demonstrated by training and evaluating a recent 6D object pose estimation method (DOPE) in various setups.
Toward Physically Plausible Data-Driven Models
A Novel Neural Network Approach to Symbolic Regression
Many real-world systems can be described by mathematical models that are human-comprehensible, easy to analyze and help explain the system's behavior. Symbolic regression is a method that can automatically generate such models from data. Historically, symbolic regression has been predominantly realized by genetic programming, a method that evolves populations of candidate solutions that are subsequently modified by genetic operators crossover and mutation. However, this approach suffers from several deficiencies: it does not scale well with the number of variables and samples in the training data-models tend to grow in size and complexity without an adequate accuracy gain, and it is hard to fine-tune the model coefficients using just genetic operators. Recently, neural networks have been applied to learn the whole analytic model, i.e., its structure and the coefficients, using gradient-based optimization algorithms. This paper proposes a novel neural network-based symbolic regression method that constructs physically plausible models based on even very small training data sets and prior knowledge about the system. The method employs an adaptive weighting scheme to effectively deal with multiple loss function terms and an epoch-wise learning process to reduce the chance of getting stuck in poor local optima. Furthermore, we propose a parameter-free method for choosing the model with the best interpolation and extrapolation performance out of all the models generated throughout the whole learning process. We experimentally evaluate the approach on four test systems: the TurtleBot 2 mobile robot, the magnetic manipulation system, the equivalent resistance of two resistors in parallel, and the longitudinal force of the anti-lock braking system. The results clearly show the potential of the method to find parsimonious models that comply with the prior knowledge provided.
ViewFormer
NeRF-Free Neural Rendering from Few Images Using Transformers
Novel view synthesis is a long-standing problem. In this work, we consider a variant of the problem where we are given only a few context views sparsely covering a scene or an object. The goal is to predict novel viewpoints in the scene, which requires learning priors. The current state of the art is based on Neural Radiance Field (NeRF), and while achieving impressive results, the methods suffer from long training times as they require evaluating millions of 3D point samples via a neural network for each image. We propose a 2D-only method that maps multiple context views and a query pose to a new image in a single pass of a neural network. Our model uses a two-stage architecture consisting of a codebook and a transformer model. The codebook is used to embed individual images into a smaller latent space, and the transformer solves the view synthesis task in this more compact space. To train our model efficiently, we introduce a novel branching attention mechanism that allows us to use the same model not only for neural rendering but also for camera pose estimation. Experimental results on real-world scenes show that our approach is competitive compared to NeRF-based methods while not reasoning explicitly in 3D, and it is faster to train.
DeepKoCo
Efficient latent planning with a task-relevant Koopman representation
Continual model learning for nonlinear dynamic systems, such as autonomous robots, presents several challenges. First, it tends to be computationally expensive as the amount of data collected by the robot quickly grows in time. Second, the model accuracy is impaired when data from repetitive motions prevail in the training set and outweigh scarcer samples that also capture interesting properties of the system. It is not known in advance which samples will be useful for model learning. Therefore, effective methods need to be employed to select informative training samples from the continuous data stream collected by the robot. Existing literature does not give any guidelines as to which of the available sample-selection methods are suitable for such a task. In this paper, we compare five sample-selection methods, including a novel method using the model prediction error. We integrate these methods into a model learning framework based on symbolic regression, which allows for learning accurate models in the form of analytic equations. Unlike the currently popular data-hungry deep learning methods, symbolic regression is able to build models even from very small training data sets. We demonstrate the approach on two real robots: the TurtleBot mobile robot and the Parrot Bebop drone. The results show that an accurate model can be constructed even from training sets as small as 24 samples. Informed sample-selection techniques based on prediction error and model variance clearly outperform uninformed methods, such as sequential or random selection.