1 

Autonomous pathtopath movement for a vehicle with mecanum wheels
Automation has played a big role in the development of the industry in the last century. Despite the desire to automate the tedious and repetitive work in greenhouses by robots, the automation level in the horticulture is still far from desired. However, the last decade shows a lot of progress in the horticulture automation and more is expected due to the increasing need for automated processes in this area. Priva B.V. started the Tomation project in 2002 to automate the task of cutting off the leaves of tomato plants. Cutting off the leaves of tomato plants is one of the crop handling tasks done on a regular basis to ensure optimal growth and ripening of the tomatoes.
This report presents research concerning the Tomation Project of Priva B.V., aimed at solving the problem of autonomous movement of the leafcutting robot in the greenhouse. The robot has to move autonomously in the path and periodically change from path to path. The focus in this report is on the pathtopath movement in the greenhouse. This is a movement where the vehicle rides off the rail, moves to the next path and finally drives back on the rail again to continue with the leaf cutting process. The robot platform is a vehicle equipped with omnidirectional mecanum wheels and powered by stepper motors from Nanotec®. These mecanum wheels are omnidirectional wheels and are able to move the vehicle in a lateral direction. The vehicle will move with mecanum wheels on the concrete path and with flanged wheels on the rail.
The vehicle with the mecanum wheels encountered difficulties driving sideways at a critical weight on the platform. Undesired movements occur at these weights, because the stepper motor is not performing all the steps. The scope of the project includes the autonomous pathtopath movement of the robot. The vision system for navigational purposes is not installed yet on the platform. Hence, the inputs from the system are entered manually in this project. It has to be taken into account that the vision system can only look at the front. The pathtopath movement should meet requirements on accuracy, time, speed and the payload of the vehicle. Furthermore, the vehicle has to be able to cope with cracks and obstructions on the road during the movement. Finally, it has to deal with rails that are shifted, because they are not fixed properly on the concrete path. Using the positional feedback of the encoders of the available stepper motor and inputs of a vision system a proper control system is required.
To get a good starting point to develop the autonomous movement of the platform, it was important to have a good understanding of the stepper motor and a clear view on the possibilities and modes of use of the motor. The working principle of the stepper motor is explained by describing the common step modes: the full step, the half step and the micro step mode. The difference between the stepper motor and a servo motor is explained. Based on the differences between these motors, the choice on the stepper motor has been reasoned. The technical specifications and the Software Development Kit of the motors are presented. These were needed to learn the possibilities in controlling and using the motors.
The mecanum wheels of the vehicle of the Tomation Project serve for omnidirectional purposes. The possibility to move in a lateral direction with these wheels was the main reason for using them. The wheels offer a way to do compact movements on the path, without obstructing the path more than necessary to avoid possible conflicts with other (moving) objects. The omnidirectional movement is realized by appropriately controlling the angular velocity of each wheel separately. Depending on each individual wheel rotation direction and velocity, the resulting combination of the wheels produces a total movement in the desired direction without changing the orientation of the wheels.
The navigational purposes of the Tomation project required a proper kinematic model of the vehicle. The equations of motion are used to derive the ideal kinematic model to reach a desired position and orientation. The measured position errors are dealt with by adjusting the kinematic model with experiments. The sources of these errors were slippage, bearing friction and point contact friction. A comprehensive research on rollers of the wheels during omnidirectional movements is performed. This was needed to understand the motion of the rollers and its contribution to the omnidirectional movements. The analyses have shown that as the movement with the vehicle gets more lateral, the rollers will contribute more to the movement. This was also observed during movements of the vehicle. These observations and analyses on the rollers are used to explain the dynamics of the wheels.
To understand why the lateral movement of the vehicle needs more phase current than the forward movement, the dynamics of the mecanum wheels are investigated. A misconception in the literature on the dynamic model is explained. The literature neglects the force perpendicular to the rollers without a proper reasoning. The aspect of motion in the rollers is not taken into account. Hence, it is decided to reject the theory in the literature on the dynamics of these vehicles. With observations and analyses on the wheels a new dynamic model has been developed. This model takes also the friction in the rollers into account. This friction causes the main difference between the required force for the forward and lateral movement. It is seen that as the movement gets more lateral, the rollers will have more contribution, so the movement will be more affected by friction. Hence, the lateral movement needs more force to overcome the friction compared to a forward movement.
The vehicle of the Tomation project encountered practical problems related to the navigational features of the vehicle. The effects of these problems are analysed and solutions are presented when needed. To navigate properly in the greenhouse a function is developed that provides the number of steps needed per wheel for a given omnidirectional movement. Furthermore, the undesired effects, caused by rounding the target speed of the motor, are dealt with by looking for the nearest movement without these errors. Next, the asynchronous start of the motors caused by the serial connection is solved with an Arduino board that generates a signal to start all the motors simultaneously. Finally, an ultrasonic proximity sensor is used to detect the concrete path when riding off the rail backwards. The vision system could not be used here, since the cameras are facing forward. These functions and additions improve the reliability and robustness of the movement in the greenhouse.
With the aforementioned functions and findings from the analyses a reliable and proper path to path movement in the greenhouse is ensured. The movement is split into three separate movements: riding off the rail, moving across to the next path and driving on the rail of the new path. After analysing alternatives for these movements, it has been decided that the vehicle will move just with two motors in the area where transition from the mecanum wheels to the flanged wheels occurs. A lateral movement with the current setup was not feasible for the required payload. If we disregard the payload of the robot, the remaining conditions did meet the requirements of the movement.
The research in this project has contributed to the autonomous movement for the vehicle of the Tomation Project of Priva. The pathtopath movement of the vehicle is realized in a reliable way, provided that the motors are strong enough to deliver the required force for the movements. The results will be used for further steps in the project.

[PDF]
[Abstract]

2 

Featherweight Camera stabilisatie systeem: Gebruiksonderzoek, ontwerp en prototype
De professionele paraglider en fotograaf Leo Westerkamp is instaat de kijker dichter bij de natuur en sport te brengen dan ieder ander log en niet wendbaar apparaat. Echter door bochten, windvlagen, luchtdrukverschillen en trillingen van de motor ondervinden de beelden een ernstig stabiliteitsprobleem. Vandaar dan ook dat Featherweight de opdracht heeft gekregen om de camerabeelden zodanig te stabiliseren dat de beelden te verkopen zijn.
Tijdens het ontwerpproces is er hoofdzakelijk rekening gehouden met de mogelijkheid de camerastabilisator lichtgewicht en makkelijk bestuurbaar te maken. Aan de andere kant mocht hierbij niet worden ingeleverd op de kwaliteit van de beelden.
Ontwerp
Om te voldoen aan de wensen van Westerkamp heeft Featherweight gekozen voor de Featherweight Professional. De Featherweight Professional bestaat uit een opzet waar de camera gestabiliseerd wordt door twee gekoppelde dc motoren. Westerkamp heeft zelf de controle om de camera ergens anders op te richten via een duimcontroller, en via de aangeraden LCDbril is hij instaat zelf realtime te kijken waar hij de camera op heeft gefocust. De motoren worden aangestuurd door een laptop, wiens data verkregen wordt door een microcontroller (ArduinoMega) en de positie van de duimcontroller. De ArduinoMega is er om de data van de acceleratiemeter en gyrometer om te zetten naar de afwijkende hoeken die ontstaan zijn door stabilisatie problemen.
Implementatie
De data van de acceleratiemeter en gyrometer wordt via een I2C protocol naar de ArduinoMega gestuurd. Hier wordt door gebruik te maken van een versimpelde Kalman filter de verkregen data omgezet naar het aantal graden dat de camera afwijkt van het punt waar hij op gestabiliseerd moet worden. De ArduinoMega stuurt het op zijn beurt door naar de laptop met de RS232 interface, waar het wordt verwerkt met MATLAB. Bij MATLAB ligt de nadruk op:
 De hoeken aanpassen door de data van de duimcontroller en de Arduino Mega te combineren;
 De hoeken begrenzen;
 De hoeken en hoeksnelheden via een adaptieve PD regelaar om te zetten naar het gewenste koppel, waar de twee RX64 motoren mee aangestuurd worden.
Daarnaast worden de huidige beelden waar de camera opgericht staat, weergegeven via een LCDscherm.
Evaluatie
Bij de evaluatie kwam naar voren dat veel van de eisen uit het programma van eisen voldaan werden, voor zover deze geïmplementeerd waren in het prototype. Het systeem is licht en compact genoeg en stabiliseert de grootste verstoringen uit het beeld. Daarnaast is het systeem makkelijk te bedienen en niet hinderlijk voor de piloot.
Aanbevelingen
Er wordt aanbevolen, voordat de Featherweight Professional in ontwikkeling wordt genomen, nader onderzoek te doen naar de gebruikerswensen, toepassen van een normale Kalman filter op de sensordata, en het omlaag brengen van de verwerkingstijd.
Ook wordt er aangeraden om mogelijk over te stappen naar een gimbal constructie, het toepassen van een lockfunctie zodat de camera altijd het gewenste object volgt, en het mogelijk toepassen van een softwarematige stabilisatie.
Om Featherweight Professional op een bredere markt te verkopen, moet er onderzoek gedaan worden naar mogelijke klanten. Dit onderzoek kan gedaan worden door een gebruikersonderzoek uit te voeren op een testgroep.

[PDF]
[Abstract]

3 

Simulation of an Artificial Respiratory System: Choosing a New Actuator for Implementation in a Lung Simulator
It is suspected that instability problems in the current generation of lung simulators are caused by its actuator, a brushless DC motor, in combination with the system configuration. The hypothesis is that these problems can be resolved by replacing the actuator with a backdrivable actuator (that is, an actuator that responds well to external force) in a new system.
In this BSc Thesis this hypothesis is researched. The backdrivable actuator (in this particular case, a Voice Coil actuator) in a new system can overcome the instability problems.

[PDF]
[Abstract]

4 

Neural network based condition monitoring for track circuits
Railway track circuits are electrical systems that are used for train detection. The identification of faults in these track circuits and the estimation of the severity of these faults is crucial for the safety and availability of railway networks. In this thesis a method is proposed to solve these tasks based on the commonly available measurement signals.
By considering the signals from multiple track circuits in a geographic area, faults can be identified from their spatial and temporal dependencies. In this thesis, artificial neural networks are used to learn these dependencies from historical data. Although not enough measurement data was available during the writing of this thesis, it is more than likely that reasonable amounts of (unlabeled) measurement data will become available at a later time, as the required measurement equipment has already been installed. To train the networks in this thesis, the small available dataset is analyzed and used together with the currently available understanding of the fault dependencies to make a generative model. The synthetic data produced by this model are used to train and test the neural networks in this thesis.
Artificial neural networks have recently achieved state of the art performance on difficult pattern recognition problems in several different fields such as image recognition and speech recognition. These recent successes can be largely attributed to the combination of large networks and large datasets. In the condition monitoring domain large datasets are generally not available. This prevents the use of the large neural networks that have become so successful in other fields. Inspite of this, some of the ideas that have become popular in other domains might still have value in the condition monitoring domain. This thesis focuses on bringing the LongShort Term Memory architecture and the concept of endtoend learning to the condition monitoring domain. To address the fact that only a limited amount of labeled data will be available, an unsupervised learning strategy is investigated. This strategy will use unlabeled data to pretrain a network so that it can more efficiently learn from the scarce labeled data.
For the fault isolation task, it is shown in this thesis that when a large amount of labeled training data is available, the endtoend learning strategy can detect and diagnose faults in the data from the generative model very accurately. When only a small amount of labeled data is available, it is shown that using a pretrained network works better than using endtoend learning.

file embargo until: 20200526
[Abstract]

5 

Nonlinear fuzzy observer for payload estimation in physical humanrobot interaction
Robotics have developed at a very high rate in the last few years. Industries are becoming capable of automating repetitive procedures without pause, and the production speed and quality are improving everyday. The main drawback of robotics in industry is the inability to deal with slight changes in an environment or in certain object parameters. This problem is very much present at Koninklijke Luchtvaart Maatschappij (KLM) Cargo, therefore they want to automate parts of their product handling cycle. KLM Cargo handles 540 000 tons of cargo every year and their processing hub is located at Schiphol airport. A number of automation opportunities exist at this location, such as the use of Automatic Guided Vehicle (AGV), smart planning algorithms and palletising/depalletising robots. This last point is addressed in this thesis.
The automation of the palletising/depalletising procedure is highly complex for a number of reasons: large variety of packages to be handled, restrictions for orientation of packages, hazardous materials, and more. The planning of a palletising order can handle the orientation problem and the placement of hazardous cargo, but the variety in size, weight and shape of packages makes fully automated palletising/depalletising difficult. The process of handling the packages is performed by manual labour and with the help of forklift trucks. An ideal solution to compensate the arduous labour is to use physical HumanRobot Interaction (pHRI). This concept allows a robot and human to handle cargo together to place it at a location. The current academic developments in humanrobot collaborative object manipulation require known object dynamics and dimensions. This information is not known at KLM cargo, and therefore, the framework of humanrobot collaborative object manipulation must be extended to allow for unknown object dynamics. One approach is currently being developed by using the Slotine and Li method of adaptive control. Unfortunately, this approach has a big drawback with regards to persistence of excitation. A new method utilizes fuzzy TakagiSugeno (TS) system with the sector nonlinearity approach where payload information is included in the systems state vector. This approach allows an observer to estimate the payload parameters without the persistence of excitation limitation. This thesis shows the motivation for using the fuzzy approach and develop it for a twolink manipulator and compare its performance to the classical approach.
While the classical approach works under persistence of excitation, a one link manipulator is shown to work without this condition. Two fuzzy observers were tested, the fuzzy Luenberger observer and a sliding mode observer. Unfortunately, for the two link manipulator a feasible observer gain is not found. The approach was tested again, but this time only to estimate the mass of the payload. Unfortunately, this did not yield a feasible observer either. In order to find out the reason for this infeasibility and create an approach that will obtain a feasible solution requires more analysis of the observability of the fuzzy system.

[PDF]
[Abstract]

6 

Ship Motion Prediction for the Ampelmann System
During rough wave climates, vessel motions prevent people to be transferred safely form and to an offshore structure. The Ampelmann system is a shipbased, self stabilizing platform that actively compensates all vessel motions to make acces to offshore structures safe, easy and fast.
The Ampelmann system uses a Steward platform to compensate ship motions. The platform consists of a rigid base frame and a rigid top frame connected by six hydraulic actuators in parallel. The system actively compensates for 5 DoF (surge, sway, heave, roll and pitch), which are measured by an Octans motion sensor.
The aim of this master thesis research is to find a algorithm which predicts realtime shortterm vessel motions. A pure time series forecasting approach was followed and both linear as nonlinear models purposed. Real vessel motion data is used to compare the models. The linear AR and ARMA give stable results, where the AR model surpasses the ARMA model in performance. As nonlinear model a Wavelet Neural Network is trained, which improves the prediction compared to the AR model, when the data is resampled.
The prediction model could be used to improve the compensation done by the Ampelmann system in several manners. For example the delays in the system could be decreased or MPC could be applied.

file embargo until: 20210210
[Abstract]

7 

Reinforcement Learning for Tracking Control in Robotics
For a robot manipulator, an accurate reference tracking capability is one of the most important performance factor. This is especially the case for a robot which is deployed for an industrial use such as welding and laser cutting. Recently, an emerging application of 3D printing also attracts researchers to use robotic manipulators in hope of gaining a higher dimension in the printing process as well as a larger workspace.
The conventional method for controlling robot manipulator involves a model based feedforward controller to eliminate the nonlinearities such as gravity and the coriolis term, combined with a feedback controller to compensate the residual error. This method, however, has a drawback in obtaining a good model of the robot. Furthermore, a robot's physical properties are subject to change over time due to, for instance, degraded gearboxes, etc. Such a problem will require the robot to reperform a system identification which is often unacceptable due to the time it takes and the possibly dangerous movement the robot must execute. Therefore, this class of controller may not be the best option for such condition.
This thesis addresses the aforementioned problem from the point of view of modelfree, learning controller which improves the tracking performance safely in an online fashion. The thesis exploits the reinforcement learning (RL) technique which uses the nature of trial and errors to correct the tracking error. Improvement to the control performance is done by feeding an additive compensation signal given by a learned policy function to the robot's nominal controller.
Two different compensation methods are proposed in the thesis. Both methods are realized by the so called actorcritic (AC) algorithm in order to cope with the continuous state of a robot arm. The first method compensates the control input of the nominal controller while the second method modifies the reference trajectory.
The methods are implemented on a real 6 DoFs 3D printing robotic setup. For the test, three different reference tracking tasks are designed and executed. Furthermore, to provide a comparative study, two conventional control methods namely model predictive control (MPC) and iterative learning control (ILC) are implemented as well. Their results are then analysed and compared to that of the proposed RLbased controllers.
The experimental result shows that the RLbased controllers improve the performance of the nominal control significantly. It is also shown that the performance of the learning controller is close to that of the modelbased control method. For a circular, smooth trajectory, the experiment even shows that the RLbased controller is able to outmatch both MPC and ILC performance.

[PDF]
[Abstract]

8 

Nonlinear State and Parameter Estimation for Hopper Dredgers
A Trailing Suction Hopper Dredger (TSHD) is a ship that excavates sediments from the sea bottom while sailing. In situ material is excavated with a special tool called the DragHead, then it is hydraulically transported through a pipe to the hopper where it is temporarily stored. After the dredging is completed the collected material is transported and discharged at a specified location. The efficiency of this process is highly dependent on the detailed knowledge of the excavated soil.
The optimization of dredging operations is of vital importance for future improvement in efficiency, accuracy and from the viewpoint of labor saving. The automated onboard systems that have been developed to optimize the dredging performance require knowledge of several uncertain soildependent parameters. These cannot be directly measured but have to be estimated online from the available measurements. Such estimation is a challenging task due to lack of sufficient sensors, severe nonlinearities in models, and timevarying nature of the parameters of interest.
In this thesis we focus on two of the most important TSHDrelated models. These are:
I. DragHead Model  describing the excavation process,
II. Hopper Model  describing the sedimentation process occurring inside the hopper.
They contain several uncertain soildependent parameters that need to be estimated.
These are:
I. horizontal cutting force coefficient kch (DragHead Model ),
II. ratio kvh between the horizontal and vertical cutting forces (DragHead Model ),
III. in situ permeability ksi (DragHead Model ),
IV. average grain diameter dm (Hopper Model ).
Both processes, together with the corresponding estimation problems, are discussed in detail in Chapter 2.
The highly uncertain and timevarying nature of the soildependent parameters and the nonlinear dynamics of the models used to describe dredging process make the estimation a challenging task. The algorithms that are capable of tackling these type of problems are Nonlinear Bayesian Filters (NBF). In Chapter 3 we review
several types of NBF, namely:
I. parametric filters based on the Taylor series expansion (EKF, IEKF),
II. parametric filters based on statistical approximations (UKF, GHF, CDF),
III. parametric filters based on Gaussian Sum approximations (GSF),
IV. nonparametric filters based on the importance sampling (BPF),
V. nonparametric filters based on the meanfield controloriented approach (FPF).
In Chapter 4 we investigate the applicability of these nonlinear filters to the estimation problems that originate from the DragHead Model. The problems are: the Cutting Estimation Problem and the Cutting and Jetting Estimation Problem. The Cutting Estimation Problem applies for any cutting excavation tool whereas the Cutting and Jetting Estimation Problem is applicable only for tools equipped with cutting and jetting components. The former problem considers estimation of the ratio kvh between cutting forces and the horizontal cutting force coefficient kch, the latter problem deals with the estimation of the horizontal cutting force coefficient kch and the in situ permeability ksi. To solve the aforementioned estimation problems one needs to handle timevarying delay in the measurement of incoming density ρi, which is discussed separately.
It is concluded that among the tested methods the best solution to the Cutting Estimation Problem is provided by the CDF and, in case of large uncertainty in the initial states, by the GSF. To solve the Cutting and Jetting Estimation Problem it is crucial to exploit the correlation between the horizontal cutting force coefficient kch and the in situ permeability ksi. This is done by a cascaded filter, which uses the PF to obtain an estimate of ksi, which will be further filtered by a Steady State Identification (SSI) filter, and finally by the BF to produce a final estimate of kch.
In Chapter 5 we develop a novel class of nonlinear particle filters: the Saturated Particle Filter (SPF) that is used to solve the Hopper Estimation Problem. The SPF is a general method designed for Saturated Stochastic Dynamical Systems (SSDS), which are severely nonlinear systems often used in modeling reallife problems. They are characterized by a constrained probability distribution exhibiting singularity on the boundary of the saturation region. Such singularities make it difficult to estimate the states or the parameters of SSDSs by standard nonlinear filters. Our new method exploits the specific structure of the SSDS in order to design an importance sampling distribution that accounts for the most recent measurements in the prediction step of the filtering algorithm.
Chapter 6 deals with the asymptotic properties of the SPF. We establish the conditions under which the SPF converges to the optimal theoretical filter. The convergence of our method is closely related to the appropriate resampling scheme. This led to the development of the improved Saturated Particle Filter (iSPF) which combines the importance sampling of the SPF with a novel resampling algorithm.
In Chapter 7 the iSPF together with other nonparametric methods from Chapter 3 are used to estimate the average grain diameter dm, which solves the Hopper Estimation Problem. Because the sedimentation process is naturally divided into three regimes, to find the most efficient filtering method we considered each mode
separately. We conclude that:
I. for the NoOverflow loading phase the best estimate of dm is obtained by the FPF,
II. for the Overflow loading phases with weak erosion, the recommended filtering method is the ReducedOrder PF,
III. for the Overflow loading phases with strong erosion, the best estimation performance is achieved by the ReducedOrder PF when the excavated soil is fine and the Hybrid SPF when the excavated soil is coarse.
The final solution to the Hopper Estimation Problem is obtained by integrating the filters designed for separate modes into a global estimator.
Chapter 8 concludes the thesis.

[PDF]
[Abstract]

9 

Online Model Learning Algorithms for ActorCritic Control
Classical control theory requires a model to be derived for a system, before any control design can take place. This can be a hard, timeconsuming process if the system is complex. Moreover, there is no way of escaping modelling errors. As an alternative approach, there is the possibility of having the system learn a controller by itself while it is in operation or offline. Reinforcement learning (RL) is such a framework in which an agent (or controller) optimises its behaviour by interacting with its environment.
For continuous state and action spaces, the use of function approximators is a necessity and a commonly used type of RL algorithms for these continuous spaces is the actorcritic algorithm, in which two independent function approximators take the role of the policy (the actor) and the value function (the critic).
A main challenge in RL is to use the information gathered during the interaction as efficiently as possible, such that an optimal policy may be reached in a short amount of time. The majority of RL algorithms at each time step measure the state, choose an action corresponding to this state, measure the next state, the corresponding reward and update a value function (and possibly a separate policy). As such, the only source of information used for learning at each time step is the last transition sample.
This thesis proposes novel actorcritic methods that aim to shorten the learning time by using every transition sample collected during learning to learn a model of the system online. It also explores the possibility of speeding up learning by providing the agent with explicit knowledge of the reward function.

[PDF]
[Abstract]

10 

Control of a macromicro actuated system
TU Delft BioRobotics Lab. developed a new type of robotic arm, called a macromicro actuated system for the inherent safety characteristics in humaninteraction environment. The macromicro actuated system is equipped with a macro actuator and a micro actuator which works together to control one joint angle. The macro actuator is connected to the link through a low stiffness spring, and the micro actuator is directly connected to the link. This was designed successfully to have the inherent safety of the robot arm. However, there remains the challenge of controlling the manipulator to possess the inherent safety with highbandwidth performance and robustness, when the robot arm performs pickandplace work of an unknown mass.
In this paper, first, the macromicro actuated system of the Delft robot arm is introduced. Second, the control challenge to have safety with highbandwidth performance is discussed along with possible control schemes for their successful implementation. In the chosen control scheme, Balancing/Tracking mode, the macro part is controlled to compensate for gravity and the micro part is controlled to realize the desired trajectory. Third, a feedback linearization method with a sliding mode algorithm and an indirect adaptive algorithm is surveyed to decrease nonlinearities of the system and to have robustness. For a fast estimation of an unknown variable and simplicity of a control algorithm, the sliding mode algorithm is applied to the chosen control scheme with the feedback linearization method. Finally, the Balancing/Tracking scheme with the feedback linearization method and the sliding mode algorithm is proposed as the controller of the macromicro actuated s ystem.

[PDF]
[Abstract]

11 

Embedding machine learning into passivity theory: a portHamiltonian approach.
Passivitybased control (PBC) is a control methodology that achieves its control objective by rendering a system passive with respect to a desired storage function. A key feature of PBC is that it exploits structural properties of the system. In this thesis, the PBC of systems endowed with a special structure, called portHamiltonian (PH) systems, has been investigated. The geometric structure of PH systems allows reformulating the PBC problem in terms of solving a generally complex partial differential equation (PDE).
Reinforcement learning (RL), on the other hand, is a learning control method that can solve complex nonlinear (stochastic) control problems without the need for a process model or explicitly solving a set of equations. In RL the controller receives an immediate numerical reward as a function of the process state and possibly control action. The goal is to find an optimal control policy that maximizes the cumulative longterm rewards, which corresponds to maximizing a value function. In this thesis, actorcritic techniques have been used, which are a class of RL methods in which a separate actor (the control law) and critic (a "memory") function are learned. A disadvantage of RL is that without having a process model it can be slow at learning and computationally expensive.
In this thesis, the goal was to design a theoretical framework using PBC techniques subject to control saturation that incorporates knowledge about the PH system and learns (optimal) control policies using actorcritic reinforcement learning. Therefore, actorcritic reinforcement learning methods have been combined with different types of PBC, e.g. EBPBC and IDAPBC. The combination of EBPBC with an actorcritic method, energybalancing actorcritic (EBAC), showed its effectiveness in the pendulum swingup problem, which was used as a benchmark test. The advantages of the method from a PBC perspective are that no PDE has to be explicitly solved, control saturation can be incorporated, the geometric structure of the PH system is preserved, (numerical) stability can be assessed using passivity theory and the learned controllers can be interpreted in terms of energyshaping strategies. From a RL perspective, the learning is speeded up because model knowledge is available.

[PDF]
[Abstract]

12 

Cogging compensation in embedded brushless motor control for haptics applications
For the development of a space qualified, 7 degrees of freedom, haptic arm exoskeleton a brushless DC motor with high torque production, power density and efficiency was selected. However, the attraction of the rotor magnets to the stator teeth introduces an additive, position dependent, torque disturbance on the motor shaft called cogging torque. In haptic applications this disturbance impacts the realism of the force reflection and limits control fidelity during masterslave teleoperation. This thesis investigates possible solutions within the constraints of the envisioned application. Due to volume, torque production and efficiency requirements the chosen motor is not to be changed. This rules out the use of motor design based cogging minimization techniques. For this reason only controlbased methods are considered. Two methods to identify the cogging waveform were developed. This data was used to do feedforward compensation using a lookuptable approach and a Fourier series approximation. For comparison, a PID feedback compensation and hybrid approach were also tested. Identification of the cogging torque and testing of compensation methods is done using a custom build measurement setup. A reduction of the RMS cogging of 39% was achieved using the feedforward approach, while the PID feedback loop resulted in a 46% reduction. A combination of these two methods in the hybrid approach resulted in a reduction of 75%.
Thesis done as part of a 'double degree' with Systems & Control at 3mE (DCSC) and Embedded Systems at EWI.

[PDF]
[Abstract]

13 

Memorybased Modeling and Prioritized Sweeping in Reinforcement Learning
Reinforcement Learning (RL) is a popular method in machine learning. In RL, an agent learns a policy by observing statetransitions and receiving feedback in the form of a reward signal. The learning problem can be solved by interaction with the system only, without prior knowledge of that system. However, realtime learning from interaction with the system only, leads to slow learning as every timeinterval can only be used to observe a single statetransition. Learning can by accelerated by using a Dynastyle algorithm. This approach learns from interaction with the real system and a model of that system simultaneously. Our research investigates two aspects of this method: Building a model during learning and implementing this model into the learning algorithm.
We use a memorybased modeling method called Local Linear Regression (LLR) to build a statetransition model during the learning process. It is expected that the quality of the model increases as the number of observed statetransitions increase. To assess the quality of the modeled statetransitions we introduce prediction intervals. We show that LLR is able to model various systems, including a complex humanoid robot.
The LLR model was added to the learning algorithm to generate more statetransitions for the agent to learn from. We show that an increasing number of experiences leads to faster learning. We introduce Prioritized Sweeping (PS) and Look Ahead (LA) Dyna as possibilities to use the model more efficiently. We show how prediction intervals can be used to increase the performance of the various algorithms. The learning algorithms were compared using an inverted pendulum simulation, which had to learn a swingup control task.

[PDF]
[Abstract]

14 

Shaping Methods to Accelerate Reinforcement Learning: From Easy to Challenging Tasks
Reinforcement learning (RL) is one of the most active research areas in artificial intelligence. In RL an agent tries to maximize the total amount of reward it receives while interacting with an environment. The reward is used to improve the policy. Conventional methods of reinforcement learning perform well for simple tasks, but as the task becomes more complex, these methods fail to converge fast or converge to a suboptimal policy. Hence, new methods of RL are needed that can handle complex tasks.
As humans, we simplify a task that is difficult to learn by first learning simplified versions of the task, before moving back to the original task. This idea of starting from simpler tasks and gradually increasing complexity, until the original task is solved, can also be exploited in RL, where it is called shaping. In order to accelerate learning in the original task, shaping methods transfer the knowledge and experiences from the easy task (source task) to the original task (target task). It is believed that the process of gradually increasing the complexity significantly reduces the difficulty of the learning problem. However, sometimes the total required time to solve the easy task plus the original task is larger than starting from scratch. In order to reduce this time, an essential decision in shaping is when to transfer learning from an easier task to a more difficult one. Transferring too early may mean the controller has not learned enough in the easy task, diminishing the usefulness of shaping. Transferring too late could make the controller waste learning time in the easy task, without making significant progress toward solving the original, complex task. If we switch the task at a proper point then the total learning time will decrease. The first part of this thesis is devoted to a thorough empirical study of the shaping methods. In the next part we try to find a suitable performance index that can be used as a switching criterion to reduce the total time needed to learn a task.

[PDF]
[Abstract]

15 

Hardware of the Respiratory Simulator: improving and stabilizing the lung simulator
The advanced technology of today has reduced the need of testing medical equipment on human patients. The invention of the lung simulator has made it possible to test ventilators without putting the condition of test patients in jeopardy. The device also allows medical staff to be trained in a safe and more rapid way.
Several types of lung simulators are available. The digitally controlled variant is the most versatile one for simulations. This type has easily adjustable static and dynamic properties, which, in contrary to traditional passive mechanical simulators, make it possible to simulate different types of breathing.
The available design of a digitally controlled lung simulator, however, has one major drawback. The system uses a motorball screw assembly to drive a piston in the air compartment. One of the properties of this assembly is that the motor shaft motion is negligibly affected (not ‘backdrivable’) by air pressure exerted on the piston by an external source. The result of this is a feedback system that needs to realize the entire dynamic response to pressure changes, by active control of the motor. Because of this, the system can easily become unstable for certain settings of the static and dynamic properties.
The objective of this thesis is to determine what the best alternative actuator is for the lung simulator, to the end of making it more stable than the current version. By improving the existing lung simulator and improving its range of operation, and its performance, it can be made more effective and accurate, which will result in better healthcare.
To reach this goal, firstly the desired requirements for the lung simulator will be obtained from the end user. After this, a literature study about alternative actuators shall be conducted. With the results of the literature study, a ranking of the alternative actuators are determined.
From the literature study eight alternative actuators have come up. These are the moving coil actuator, moving iron actuator, permanent magnet synchronous motor (PMSM), ironless core motor, piezoelectric actuator, pneumatic actuator, hydraulic actuator and wax motor. The most important requirements on which the different actuators are evaluated are backdrivability, force linearity, force, precision, speed and stroke length.
Of the eight alternative actuators, except for the PMSM and the ironless core actuator, all the others have an insufficient score on one or more of the requirements. A comparison of the PMSM and the ironless core actuator shows that the backdrivability of the ironless core actuator is better, and that it has the best overall score.
The conclusion is that the ironless core actuator is the most suitable alternative for the lung simulator. The use of the actuator should increase the stability of the lung simulator. However, no testing was possible because of malfunctioning hardware and time shortage. Therefore no practical confirmation could be obtained about the effects of the new actuator on the system.
It is recommended to put the topranked actuators to the test in a prototype. From the test results it will be possible to confirm whether the instability of the original lung simulator was indeed caused by the actuator, and whether the ironless core motor is indeed the best suitable alternative actuator for the lung simulator.

[PDF]
[Abstract]

16 

Identification and Feedforward Control of a Dropondemand Inkjet Printhead

[PDF]

17 

Learning Parameter Selection in Continuous Reinforcement Learning: Attempting to Reduce Tuning Effords
The reinforcement learning (RL) framework enables to construct controllers that try to find find an optimal control strategy in an unknown environment by trialanderror. After selecting a control action, the controller receives a numerical reward. The reward signal is based on the current state of the environment and the applied control action. The controller aims to maximize the cumulative reward, known as the return. In this thesis actorcritic and criticonly RL algorithms are considered. Actorcritic algorithms consist of an element that selects the actions (the actor) and an element that learns the expectation of the return (the critic). This expectation is captured in a value function. The critic is used to improve the control policy of the actor. Criticonly algorithms select the action by direct optimization over a value function.
Before a RL algorithm can be applied to a control problem, a number of learning parameters need to be set. The optimal values of some of these parameters are highly problem dependent. It is not straightforward how these parameters should be chosen and often these are often determined by trying a large set of parameters. The main focus of thesis is to devise an action selection method that is able to select continuous actions without problem dependent parameters. Two approaches are taken: first, it is investigated if Levenberg Marquardt (LM), a popular optimization method, can be used to determine the actor update step. Second, an action selection method is treated that lacks an explicit actor, called ValueGradient Based Policy (VGBP).
The LM algorithm uses the gradient and the Hessian to compute the update step. Therefore the policy gradient and Hessian need to be found. A novel actorcritic method has been devised, called Vanilla ActorCritic (VAC), that efficiently learns the policy gradient. On the inverted pendulum swingup task this algorithm outperformed Natural ActorCritic (NAC). A number of different approaches have been taken to approximate the policy Hessian, but none delivered a proper Hessian estimate. Therefore, no LM actor update method was created.
In VGBP the action is found by optimization of the right hand side of the Bellman equation. VGBP uses the provided reward function and a process model for this optimization. The process model is learned online using local linear regression (LLR). Due to the efficient use of information VGBP shows fast learning on the pendulum and a 2DOF robotic arm.

[PDF]
[Abstract]

18 

Imitation learning for a robotic precision placement task
In industrial environments robots are used for various tasks. At this moment it is not feasible for companies to deploy robots for productions with a limited batch size or for products with large variations. The use of robots for such environments can become feasible through a new generation of robots and software which can adapt quickly to new situations and learn from their mistakes while being programmable without needing an expert.
A concept that can enable the transition to flexible robotics is the combination of imitation learning and reinforcement learning. The purpose of imitation learning is to learn a task by generalizing from observations. The power of imitation learning is that the robot is programmed in an intuitive way while the insight of the teacher is incorporated in the execution of the task.
This research studies the combination of imitation and reinforcement learning, the research is applied to an industrial usecase. The research question of this study is: "Can imitation learning be combined with reinforcement learning to achieve a successful application in an industrial robotic precision placement task?"
To imitate the demonstrated trajectories, Dynamic Movement Primitives (DMPs) are used. The DMPs are used to encode the observed trajectories. DMPs can be seen as a springdamper like system with a nonlinear forcing term. The forcing term is a sum of Gaussian basis functions with each its corresponding weight. Reinforcement learning can be applied to these weights to alter the shape of the trajectory created by a DMP. Policy Gradients with Parameter based Exploration (PGPE) is used as reinforcement learning algorithm to optimize the recorded trajectories.
Experiments done on a UR5 show that without the learning step, the DMPs are able to provide a trajectory that results in a successful execution of a robotic precision placement task. The experiments also show that the learning algorithm is not able to remove noise from a demonstrated trajectory or complete a partial demonstrated trajectory. Therefore it can be concluded that the PGPE algorithm is not suited for reinforcement learning in robotics in its current form. It is therefore recommended to apply a dataefficient version of the PGPE algorithm in order to achieve better learning results.

[PDF]
[Abstract]

19 

RealTime Optimistic Planning for the Control of Nonlinear Systems
Optimistic Planning is a modelbased online planning algorithm that guarantees nearoptimal actions for the control arbitrarily nonlinear systems. Planning algorithms aim to find optimal actions by starting from the current state and developing a tree representation of sequences of actions and resulting states, using a model to simulate statetransitions. Typically, online planning algorithms return a sequence of actions, apply the first action (or several actions at the start of the sequence) and start planning again from the new state, resembling the receding horizon principle as seen in Model Predictive Control.
Several optimistic planning algorithms exist, of which in this work only Optimistic Planning for Deterministic systems (OPD) is considered. OPD works for large, possibly infinite state spaces, but only for finite, discrete action spaces. Unfortunately, while OPD shows good theoretical nearoptimality guarantees, there is no record yet of OPD being applied to control nonlinear physical systems in realtime. This is because of the (long) computation required by OPD.
This work analyzes two main methods that can be used to make OPD suitable for realtime applications. The first approach is to increase the computational speed of the planning process by parallelizing the algorithm. Unfortunately, while parallelization has been proven to be able to increase the computational speed in classical planning, in experiments no improvement is found yet for OPD using parallelization. However, a potential benefit from creating a parallel version of OPD is not ruled out and it is expected that more research and more efficient implementations could still lead to an increase in the computational speed.
The second approach is to apply sequences of actions instead of single actions, which increases the time available for the planning process. Replanning starts immediately after a sequence is returned, using as initial state a prediction of the state at the end of the previous sequence. The resulting algorithm is called RealTime Optimistic Planning with Action Sequences (RTOPS). Extensive analysis is performed to find restrictions on the parameters of the algorithm that, when met, can guarantee realtime applicability. Additionally, the effect of using sequences of actions on the performance of the algorithm is investigated and bounds are put on the maximum performance loss.
The performance of RTOPS has been tested in various experiments on different problems: a cartpole simulation, an acrobot simulation and a real inverted pendulum. Different settings are compared and, overall, RTOPS proves to perform well, without violating realtime constraints. The experiments prove that RTOPS allows for the use of optimistic planning for realtime control of physical nonlinear systems.
Future work should focus on applying the ideas used to develop RTOPS to other optimistic planning algorithms, such as those that allow for continuous actions or stochastic systems. Furthermore, a parallelization of RTOPS could be developed that increases its computational speed.

[PDF]
[Abstract]

20 

Nonlinear and learning control of a onedimensional magnetic manipulator
This thesis describes the control design for a magnetic manipulator. The experimental setup has four electromagnets (coils) which can be used to shape the magnetic field above the magnets by controlling the currents through all coils. By doing this, the steel ball can be positioned horizontally in one degree of freedom. The control objective consists of accurate and fast regulation of the ball. The magnetic force created by the coils is highly nonlinear. An empirical model is used to approximate the force exerted by each coil. This leads to a constrained nonlinear control problem. Multiple nonlinear controllers are designed: a Feedback Linearization (FL) controller as benchmark, a State Dependent Riccati Equation (SDRE) controller, a ConstrainedSDRE (MPC approach) and a Nonlinear Model Predictive Controller (NMPC). Furthermore, two learning controllers are designed: a Reinforcement Learning controller and an Imitation Learning controller based on Local Linear Regression (LLR).
All controllers are evaluated in a simulation study. A satisfactory performance was achieved for both the FL controller and the ConstrainedSDRE controller.
The NMPC is not feasible in realtime and the RL controller did not achieve a satisfactory performance.
The other four controllers were successfully implemented on the experimental setup. From the results of the modelbased controllers on the setup it can be concluded that the Constrained SDRE performs best in terms of settling time, overshoot and control effort. The imitation learning controller is able to match the performance of the CSDRE controller, and performs better than the CSDRE in terms of adapting to a different ball size.

file embargo until: 20160630
[Abstract]
