1 

Simulation of an Artificial Respiratory System: Choosing a New Actuator for Implementation in a Lung Simulator
It is suspected that instability problems in the current generation of lung simulators are caused by its actuator, a brushless DC motor, in combination with the system configuration. The hypothesis is that these problems can be resolved by replacing the actuator with a backdrivable actuator (that is, an actuator that responds well to external force) in a new system.
In this BSc Thesis this hypothesis is researched. The backdrivable actuator (in this particular case, a Voice Coil actuator) in a new system can overcome the instability problems.

[PDF]
[Abstract]

2 

Featherweight Camera stabilisatie systeem: Gebruiksonderzoek, ontwerp en prototype
De professionele paraglider en fotograaf Leo Westerkamp is instaat de kijker dichter bij de natuur en sport te brengen dan ieder ander log en niet wendbaar apparaat. Echter door bochten, windvlagen, luchtdrukverschillen en trillingen van de motor ondervinden de beelden een ernstig stabiliteitsprobleem. Vandaar dan ook dat Featherweight de opdracht heeft gekregen om de camerabeelden zodanig te stabiliseren dat de beelden te verkopen zijn.
Tijdens het ontwerpproces is er hoofdzakelijk rekening gehouden met de mogelijkheid de camerastabilisator lichtgewicht en makkelijk bestuurbaar te maken. Aan de andere kant mocht hierbij niet worden ingeleverd op de kwaliteit van de beelden.
Ontwerp
Om te voldoen aan de wensen van Westerkamp heeft Featherweight gekozen voor de Featherweight Professional. De Featherweight Professional bestaat uit een opzet waar de camera gestabiliseerd wordt door twee gekoppelde dc motoren. Westerkamp heeft zelf de controle om de camera ergens anders op te richten via een duimcontroller, en via de aangeraden LCDbril is hij instaat zelf realtime te kijken waar hij de camera op heeft gefocust. De motoren worden aangestuurd door een laptop, wiens data verkregen wordt door een microcontroller (ArduinoMega) en de positie van de duimcontroller. De ArduinoMega is er om de data van de acceleratiemeter en gyrometer om te zetten naar de afwijkende hoeken die ontstaan zijn door stabilisatie problemen.
Implementatie
De data van de acceleratiemeter en gyrometer wordt via een I2C protocol naar de ArduinoMega gestuurd. Hier wordt door gebruik te maken van een versimpelde Kalman filter de verkregen data omgezet naar het aantal graden dat de camera afwijkt van het punt waar hij op gestabiliseerd moet worden. De ArduinoMega stuurt het op zijn beurt door naar de laptop met de RS232 interface, waar het wordt verwerkt met MATLAB. Bij MATLAB ligt de nadruk op:
 De hoeken aanpassen door de data van de duimcontroller en de Arduino Mega te combineren;
 De hoeken begrenzen;
 De hoeken en hoeksnelheden via een adaptieve PD regelaar om te zetten naar het gewenste koppel, waar de twee RX64 motoren mee aangestuurd worden.
Daarnaast worden de huidige beelden waar de camera opgericht staat, weergegeven via een LCDscherm.
Evaluatie
Bij de evaluatie kwam naar voren dat veel van de eisen uit het programma van eisen voldaan werden, voor zover deze geïmplementeerd waren in het prototype. Het systeem is licht en compact genoeg en stabiliseert de grootste verstoringen uit het beeld. Daarnaast is het systeem makkelijk te bedienen en niet hinderlijk voor de piloot.
Aanbevelingen
Er wordt aanbevolen, voordat de Featherweight Professional in ontwikkeling wordt genomen, nader onderzoek te doen naar de gebruikerswensen, toepassen van een normale Kalman filter op de sensordata, en het omlaag brengen van de verwerkingstijd.
Ook wordt er aangeraden om mogelijk over te stappen naar een gimbal constructie, het toepassen van een lockfunctie zodat de camera altijd het gewenste object volgt, en het mogelijk toepassen van een softwarematige stabilisatie.
Om Featherweight Professional op een bredere markt te verkopen, moet er onderzoek gedaan worden naar mogelijke klanten. Dit onderzoek kan gedaan worden door een gebruikersonderzoek uit te voeren op een testgroep.

[PDF]
[Abstract]

3 

Autonomous pathtopath movement for a vehicle with mecanum wheels
Automation has played a big role in the development of the industry in the last century. Despite the desire to automate the tedious and repetitive work in greenhouses by robots, the automation level in the horticulture is still far from desired. However, the last decade shows a lot of progress in the horticulture automation and more is expected due to the increasing need for automated processes in this area. Priva B.V. started the Tomation project in 2002 to automate the task of cutting off the leaves of tomato plants. Cutting off the leaves of tomato plants is one of the crop handling tasks done on a regular basis to ensure optimal growth and ripening of the tomatoes.
This report presents research concerning the Tomation Project of Priva B.V., aimed at solving the problem of autonomous movement of the leafcutting robot in the greenhouse. The robot has to move autonomously in the path and periodically change from path to path. The focus in this report is on the pathtopath movement in the greenhouse. This is a movement where the vehicle rides off the rail, moves to the next path and finally drives back on the rail again to continue with the leaf cutting process. The robot platform is a vehicle equipped with omnidirectional mecanum wheels and powered by stepper motors from Nanotec®. These mecanum wheels are omnidirectional wheels and are able to move the vehicle in a lateral direction. The vehicle will move with mecanum wheels on the concrete path and with flanged wheels on the rail.
The vehicle with the mecanum wheels encountered difficulties driving sideways at a critical weight on the platform. Undesired movements occur at these weights, because the stepper motor is not performing all the steps. The scope of the project includes the autonomous pathtopath movement of the robot. The vision system for navigational purposes is not installed yet on the platform. Hence, the inputs from the system are entered manually in this project. It has to be taken into account that the vision system can only look at the front. The pathtopath movement should meet requirements on accuracy, time, speed and the payload of the vehicle. Furthermore, the vehicle has to be able to cope with cracks and obstructions on the road during the movement. Finally, it has to deal with rails that are shifted, because they are not fixed properly on the concrete path. Using the positional feedback of the encoders of the available stepper motor and inputs of a vision system a proper control system is required.
To get a good starting point to develop the autonomous movement of the platform, it was important to have a good understanding of the stepper motor and a clear view on the possibilities and modes of use of the motor. The working principle of the stepper motor is explained by describing the common step modes: the full step, the half step and the micro step mode. The difference between the stepper motor and a servo motor is explained. Based on the differences between these motors, the choice on the stepper motor has been reasoned. The technical specifications and the Software Development Kit of the motors are presented. These were needed to learn the possibilities in controlling and using the motors.
The mecanum wheels of the vehicle of the Tomation Project serve for omnidirectional purposes. The possibility to move in a lateral direction with these wheels was the main reason for using them. The wheels offer a way to do compact movements on the path, without obstructing the path more than necessary to avoid possible conflicts with other (moving) objects. The omnidirectional movement is realized by appropriately controlling the angular velocity of each wheel separately. Depending on each individual wheel rotation direction and velocity, the resulting combination of the wheels produces a total movement in the desired direction without changing the orientation of the wheels.
The navigational purposes of the Tomation project required a proper kinematic model of the vehicle. The equations of motion are used to derive the ideal kinematic model to reach a desired position and orientation. The measured position errors are dealt with by adjusting the kinematic model with experiments. The sources of these errors were slippage, bearing friction and point contact friction. A comprehensive research on rollers of the wheels during omnidirectional movements is performed. This was needed to understand the motion of the rollers and its contribution to the omnidirectional movements. The analyses have shown that as the movement with the vehicle gets more lateral, the rollers will contribute more to the movement. This was also observed during movements of the vehicle. These observations and analyses on the rollers are used to explain the dynamics of the wheels.
To understand why the lateral movement of the vehicle needs more phase current than the forward movement, the dynamics of the mecanum wheels are investigated. A misconception in the literature on the dynamic model is explained. The literature neglects the force perpendicular to the rollers without a proper reasoning. The aspect of motion in the rollers is not taken into account. Hence, it is decided to reject the theory in the literature on the dynamics of these vehicles. With observations and analyses on the wheels a new dynamic model has been developed. This model takes also the friction in the rollers into account. This friction causes the main difference between the required force for the forward and lateral movement. It is seen that as the movement gets more lateral, the rollers will have more contribution, so the movement will be more affected by friction. Hence, the lateral movement needs more force to overcome the friction compared to a forward movement.
The vehicle of the Tomation project encountered practical problems related to the navigational features of the vehicle. The effects of these problems are analysed and solutions are presented when needed. To navigate properly in the greenhouse a function is developed that provides the number of steps needed per wheel for a given omnidirectional movement. Furthermore, the undesired effects, caused by rounding the target speed of the motor, are dealt with by looking for the nearest movement without these errors. Next, the asynchronous start of the motors caused by the serial connection is solved with an Arduino board that generates a signal to start all the motors simultaneously. Finally, an ultrasonic proximity sensor is used to detect the concrete path when riding off the rail backwards. The vision system could not be used here, since the cameras are facing forward. These functions and additions improve the reliability and robustness of the movement in the greenhouse.
With the aforementioned functions and findings from the analyses a reliable and proper path to path movement in the greenhouse is ensured. The movement is split into three separate movements: riding off the rail, moving across to the next path and driving on the rail of the new path. After analysing alternatives for these movements, it has been decided that the vehicle will move just with two motors in the area where transition from the mecanum wheels to the flanged wheels occurs. A lateral movement with the current setup was not feasible for the required payload. If we disregard the payload of the robot, the remaining conditions did meet the requirements of the movement.
The research in this project has contributed to the autonomous movement for the vehicle of the Tomation Project of Priva. The pathtopath movement of the vehicle is realized in a reliable way, provided that the motors are strong enough to deliver the required force for the movements. The results will be used for further steps in the project.

file embargo until: 20151218
[Abstract]

4 

Nonlinear State and Parameter Estimation for Hopper Dredgers
A Trailing Suction Hopper Dredger (TSHD) is a ship that excavates sediments from the sea bottom while sailing. In situ material is excavated with a special tool called the DragHead, then it is hydraulically transported through a pipe to the hopper where it is temporarily stored. After the dredging is completed the collected material is transported and discharged at a specified location. The efficiency of this process is highly dependent on the detailed knowledge of the excavated soil.
The optimization of dredging operations is of vital importance for future improvement in efficiency, accuracy and from the viewpoint of labor saving. The automated onboard systems that have been developed to optimize the dredging performance require knowledge of several uncertain soildependent parameters. These cannot be directly measured but have to be estimated online from the available measurements. Such estimation is a challenging task due to lack of sufficient sensors, severe nonlinearities in models, and timevarying nature of the parameters of interest.
In this thesis we focus on two of the most important TSHDrelated models. These are:
I. DragHead Model  describing the excavation process,
II. Hopper Model  describing the sedimentation process occurring inside the hopper.
They contain several uncertain soildependent parameters that need to be estimated.
These are:
I. horizontal cutting force coefficient kch (DragHead Model ),
II. ratio kvh between the horizontal and vertical cutting forces (DragHead Model ),
III. in situ permeability ksi (DragHead Model ),
IV. average grain diameter dm (Hopper Model ).
Both processes, together with the corresponding estimation problems, are discussed in detail in Chapter 2.
The highly uncertain and timevarying nature of the soildependent parameters and the nonlinear dynamics of the models used to describe dredging process make the estimation a challenging task. The algorithms that are capable of tackling these type of problems are Nonlinear Bayesian Filters (NBF). In Chapter 3 we review
several types of NBF, namely:
I. parametric filters based on the Taylor series expansion (EKF, IEKF),
II. parametric filters based on statistical approximations (UKF, GHF, CDF),
III. parametric filters based on Gaussian Sum approximations (GSF),
IV. nonparametric filters based on the importance sampling (BPF),
V. nonparametric filters based on the meanfield controloriented approach (FPF).
In Chapter 4 we investigate the applicability of these nonlinear filters to the estimation problems that originate from the DragHead Model. The problems are: the Cutting Estimation Problem and the Cutting and Jetting Estimation Problem. The Cutting Estimation Problem applies for any cutting excavation tool whereas the Cutting and Jetting Estimation Problem is applicable only for tools equipped with cutting and jetting components. The former problem considers estimation of the ratio kvh between cutting forces and the horizontal cutting force coefficient kch, the latter problem deals with the estimation of the horizontal cutting force coefficient kch and the in situ permeability ksi. To solve the aforementioned estimation problems one needs to handle timevarying delay in the measurement of incoming density ρi, which is discussed separately.
It is concluded that among the tested methods the best solution to the Cutting Estimation Problem is provided by the CDF and, in case of large uncertainty in the initial states, by the GSF. To solve the Cutting and Jetting Estimation Problem it is crucial to exploit the correlation between the horizontal cutting force coefficient kch and the in situ permeability ksi. This is done by a cascaded filter, which uses the PF to obtain an estimate of ksi, which will be further filtered by a Steady State Identification (SSI) filter, and finally by the BF to produce a final estimate of kch.
In Chapter 5 we develop a novel class of nonlinear particle filters: the Saturated Particle Filter (SPF) that is used to solve the Hopper Estimation Problem. The SPF is a general method designed for Saturated Stochastic Dynamical Systems (SSDS), which are severely nonlinear systems often used in modeling reallife problems. They are characterized by a constrained probability distribution exhibiting singularity on the boundary of the saturation region. Such singularities make it difficult to estimate the states or the parameters of SSDSs by standard nonlinear filters. Our new method exploits the specific structure of the SSDS in order to design an importance sampling distribution that accounts for the most recent measurements in the prediction step of the filtering algorithm.
Chapter 6 deals with the asymptotic properties of the SPF. We establish the conditions under which the SPF converges to the optimal theoretical filter. The convergence of our method is closely related to the appropriate resampling scheme. This led to the development of the improved Saturated Particle Filter (iSPF) which combines the importance sampling of the SPF with a novel resampling algorithm.
In Chapter 7 the iSPF together with other nonparametric methods from Chapter 3 are used to estimate the average grain diameter dm, which solves the Hopper Estimation Problem. Because the sedimentation process is naturally divided into three regimes, to find the most efficient filtering method we considered each mode
separately. We conclude that:
I. for the NoOverflow loading phase the best estimate of dm is obtained by the FPF,
II. for the Overflow loading phases with weak erosion, the recommended filtering method is the ReducedOrder PF,
III. for the Overflow loading phases with strong erosion, the best estimation performance is achieved by the ReducedOrder PF when the excavated soil is fine and the Hybrid SPF when the excavated soil is coarse.
The final solution to the Hopper Estimation Problem is obtained by integrating the filters designed for separate modes into a global estimator.
Chapter 8 concludes the thesis.

[PDF]
[Abstract]

5 

Online Model Learning Algorithms for ActorCritic Control
Classical control theory requires a model to be derived for a system, before any control design can take place. This can be a hard, timeconsuming process if the system is complex. Moreover, there is no way of escaping modelling errors. As an alternative approach, there is the possibility of having the system learn a controller by itself while it is in operation or offline. Reinforcement learning (RL) is such a framework in which an agent (or controller) optimises its behaviour by interacting with its environment.
For continuous state and action spaces, the use of function approximators is a necessity and a commonly used type of RL algorithms for these continuous spaces is the actorcritic algorithm, in which two independent function approximators take the role of the policy (the actor) and the value function (the critic).
A main challenge in RL is to use the information gathered during the interaction as efficiently as possible, such that an optimal policy may be reached in a short amount of time. The majority of RL algorithms at each time step measure the state, choose an action corresponding to this state, measure the next state, the corresponding reward and update a value function (and possibly a separate policy). As such, the only source of information used for learning at each time step is the last transition sample.
This thesis proposes novel actorcritic methods that aim to shorten the learning time by using every transition sample collected during learning to learn a model of the system online. It also explores the possibility of speeding up learning by providing the agent with explicit knowledge of the reward function.

[PDF]
[Abstract]

6 

Shaping Methods to Accelerate Reinforcement Learning: From Easy to Challenging Tasks
Reinforcement learning (RL) is one of the most active research areas in artificial intelligence. In RL an agent tries to maximize the total amount of reward it receives while interacting with an environment. The reward is used to improve the policy. Conventional methods of reinforcement learning perform well for simple tasks, but as the task becomes more complex, these methods fail to converge fast or converge to a suboptimal policy. Hence, new methods of RL are needed that can handle complex tasks.
As humans, we simplify a task that is difficult to learn by first learning simplified versions of the task, before moving back to the original task. This idea of starting from simpler tasks and gradually increasing complexity, until the original task is solved, can also be exploited in RL, where it is called shaping. In order to accelerate learning in the original task, shaping methods transfer the knowledge and experiences from the easy task (source task) to the original task (target task). It is believed that the process of gradually increasing the complexity significantly reduces the difficulty of the learning problem. However, sometimes the total required time to solve the easy task plus the original task is larger than starting from scratch. In order to reduce this time, an essential decision in shaping is when to transfer learning from an easier task to a more difficult one. Transferring too early may mean the controller has not learned enough in the easy task, diminishing the usefulness of shaping. Transferring too late could make the controller waste learning time in the easy task, without making significant progress toward solving the original, complex task. If we switch the task at a proper point then the total learning time will decrease. The first part of this thesis is devoted to a thorough empirical study of the shaping methods. In the next part we try to find a suitable performance index that can be used as a switching criterion to reduce the total time needed to learn a task.

[PDF]
[Abstract]

7 

Memorybased Modeling and Prioritized Sweeping in Reinforcement Learning
Reinforcement Learning (RL) is a popular method in machine learning. In RL, an agent learns a policy by observing statetransitions and receiving feedback in the form of a reward signal. The learning problem can be solved by interaction with the system only, without prior knowledge of that system. However, realtime learning from interaction with the system only, leads to slow learning as every timeinterval can only be used to observe a single statetransition. Learning can by accelerated by using a Dynastyle algorithm. This approach learns from interaction with the real system and a model of that system simultaneously. Our research investigates two aspects of this method: Building a model during learning and implementing this model into the learning algorithm.
We use a memorybased modeling method called Local Linear Regression (LLR) to build a statetransition model during the learning process. It is expected that the quality of the model increases as the number of observed statetransitions increase. To assess the quality of the modeled statetransitions we introduce prediction intervals. We show that LLR is able to model various systems, including a complex humanoid robot.
The LLR model was added to the learning algorithm to generate more statetransitions for the agent to learn from. We show that an increasing number of experiences leads to faster learning. We introduce Prioritized Sweeping (PS) and Look Ahead (LA) Dyna as possibilities to use the model more efficiently. We show how prediction intervals can be used to increase the performance of the various algorithms. The learning algorithms were compared using an inverted pendulum simulation, which had to learn a swingup control task.

[PDF]
[Abstract]

8 

Hardware of the Respiratory Simulator: improving and stabilizing the lung simulator
The advanced technology of today has reduced the need of testing medical equipment on human patients. The invention of the lung simulator has made it possible to test ventilators without putting the condition of test patients in jeopardy. The device also allows medical staff to be trained in a safe and more rapid way.
Several types of lung simulators are available. The digitally controlled variant is the most versatile one for simulations. This type has easily adjustable static and dynamic properties, which, in contrary to traditional passive mechanical simulators, make it possible to simulate different types of breathing.
The available design of a digitally controlled lung simulator, however, has one major drawback. The system uses a motorball screw assembly to drive a piston in the air compartment. One of the properties of this assembly is that the motor shaft motion is negligibly affected (not ‘backdrivable’) by air pressure exerted on the piston by an external source. The result of this is a feedback system that needs to realize the entire dynamic response to pressure changes, by active control of the motor. Because of this, the system can easily become unstable for certain settings of the static and dynamic properties.
The objective of this thesis is to determine what the best alternative actuator is for the lung simulator, to the end of making it more stable than the current version. By improving the existing lung simulator and improving its range of operation, and its performance, it can be made more effective and accurate, which will result in better healthcare.
To reach this goal, firstly the desired requirements for the lung simulator will be obtained from the end user. After this, a literature study about alternative actuators shall be conducted. With the results of the literature study, a ranking of the alternative actuators are determined.
From the literature study eight alternative actuators have come up. These are the moving coil actuator, moving iron actuator, permanent magnet synchronous motor (PMSM), ironless core motor, piezoelectric actuator, pneumatic actuator, hydraulic actuator and wax motor. The most important requirements on which the different actuators are evaluated are backdrivability, force linearity, force, precision, speed and stroke length.
Of the eight alternative actuators, except for the PMSM and the ironless core actuator, all the others have an insufficient score on one or more of the requirements. A comparison of the PMSM and the ironless core actuator shows that the backdrivability of the ironless core actuator is better, and that it has the best overall score.
The conclusion is that the ironless core actuator is the most suitable alternative for the lung simulator. The use of the actuator should increase the stability of the lung simulator. However, no testing was possible because of malfunctioning hardware and time shortage. Therefore no practical confirmation could be obtained about the effects of the new actuator on the system.
It is recommended to put the topranked actuators to the test in a prototype. From the test results it will be possible to confirm whether the instability of the original lung simulator was indeed caused by the actuator, and whether the ironless core motor is indeed the best suitable alternative actuator for the lung simulator.

[PDF]
[Abstract]

9 

Identification and Feedforward Control of a Dropondemand Inkjet Printhead

[PDF]

10 

Cogging compensation in embedded brushless motor control for haptics applications
For the development of a space qualified, 7 degrees of freedom, haptic arm exoskeleton a brushless DC motor with high torque production, power density and efficiency was selected. However, the attraction of the rotor magnets to the stator teeth introduces an additive, position dependent, torque disturbance on the motor shaft called cogging torque. In haptic applications this disturbance impacts the realism of the force reflection and limits control fidelity during masterslave teleoperation. This thesis investigates possible solutions within the constraints of the envisioned application. Due to volume, torque production and efficiency requirements the chosen motor is not to be changed. This rules out the use of motor design based cogging minimization techniques. For this reason only controlbased methods are considered. Two methods to identify the cogging waveform were developed. This data was used to do feedforward compensation using a lookuptable approach and a Fourier series approximation. For comparison, a PID feedback compensation and hybrid approach were also tested. Identification of the cogging torque and testing of compensation methods is done using a custom build measurement setup. A reduction of the RMS cogging of 39% was achieved using the feedforward approach, while the PID feedback loop resulted in a 46% reduction. A combination of these two methods in the hybrid approach resulted in a reduction of 75%.
Thesis done as part of a 'double degree' with Systems & Control at 3mE (DCSC) and Embedded Systems at EWI.

[PDF]
[Abstract]

11 

Control of a macromicro actuated system
TU Delft BioRobotics Lab. developed a new type of robotic arm, called a macromicro actuated system for the inherent safety characteristics in humaninteraction environment. The macromicro actuated system is equipped with a macro actuator and a micro actuator which works together to control one joint angle. The macro actuator is connected to the link through a low stiffness spring, and the micro actuator is directly connected to the link. This was designed successfully to have the inherent safety of the robot arm. However, there remains the challenge of controlling the manipulator to possess the inherent safety with highbandwidth performance and robustness, when the robot arm performs pickandplace work of an unknown mass.
In this paper, first, the macromicro actuated system of the Delft robot arm is introduced. Second, the control challenge to have safety with highbandwidth performance is discussed along with possible control schemes for their successful implementation. In the chosen control scheme, Balancing/Tracking mode, the macro part is controlled to compensate for gravity and the micro part is controlled to realize the desired trajectory. Third, a feedback linearization method with a sliding mode algorithm and an indirect adaptive algorithm is surveyed to decrease nonlinearities of the system and to have robustness. For a fast estimation of an unknown variable and simplicity of a control algorithm, the sliding mode algorithm is applied to the chosen control scheme with the feedback linearization method. Finally, the Balancing/Tracking scheme with the feedback linearization method and the sliding mode algorithm is proposed as the controller of the macromicro actuated s ystem.

[PDF]
[Abstract]

12 

Embedding machine learning into passivity theory: a portHamiltonian approach.
Passivitybased control (PBC) is a control methodology that achieves its control objective by rendering a system passive with respect to a desired storage function. A key feature of PBC is that it exploits structural properties of the system. In this thesis, the PBC of systems endowed with a special structure, called portHamiltonian (PH) systems, has been investigated. The geometric structure of PH systems allows reformulating the PBC problem in terms of solving a generally complex partial differential equation (PDE).
Reinforcement learning (RL), on the other hand, is a learning control method that can solve complex nonlinear (stochastic) control problems without the need for a process model or explicitly solving a set of equations. In RL the controller receives an immediate numerical reward as a function of the process state and possibly control action. The goal is to find an optimal control policy that maximizes the cumulative longterm rewards, which corresponds to maximizing a value function. In this thesis, actorcritic techniques have been used, which are a class of RL methods in which a separate actor (the control law) and critic (a "memory") function are learned. A disadvantage of RL is that without having a process model it can be slow at learning and computationally expensive.
In this thesis, the goal was to design a theoretical framework using PBC techniques subject to control saturation that incorporates knowledge about the PH system and learns (optimal) control policies using actorcritic reinforcement learning. Therefore, actorcritic reinforcement learning methods have been combined with different types of PBC, e.g. EBPBC and IDAPBC. The combination of EBPBC with an actorcritic method, energybalancing actorcritic (EBAC), showed its effectiveness in the pendulum swingup problem, which was used as a benchmark test. The advantages of the method from a PBC perspective are that no PDE has to be explicitly solved, control saturation can be incorporated, the geometric structure of the PH system is preserved, (numerical) stability can be assessed using passivity theory and the learned controllers can be interpreted in terms of energyshaping strategies. From a RL perspective, the learning is speeded up because model knowledge is available.

[PDF]
[Abstract]

13 

Learning Parameter Selection in Continuous Reinforcement Learning: Attempting to Reduce Tuning Effords
The reinforcement learning (RL) framework enables to construct controllers that try to find find an optimal control strategy in an unknown environment by trialanderror. After selecting a control action, the controller receives a numerical reward. The reward signal is based on the current state of the environment and the applied control action. The controller aims to maximize the cumulative reward, known as the return. In this thesis actorcritic and criticonly RL algorithms are considered. Actorcritic algorithms consist of an element that selects the actions (the actor) and an element that learns the expectation of the return (the critic). This expectation is captured in a value function. The critic is used to improve the control policy of the actor. Criticonly algorithms select the action by direct optimization over a value function.
Before a RL algorithm can be applied to a control problem, a number of learning parameters need to be set. The optimal values of some of these parameters are highly problem dependent. It is not straightforward how these parameters should be chosen and often these are often determined by trying a large set of parameters. The main focus of thesis is to devise an action selection method that is able to select continuous actions without problem dependent parameters. Two approaches are taken: first, it is investigated if Levenberg Marquardt (LM), a popular optimization method, can be used to determine the actor update step. Second, an action selection method is treated that lacks an explicit actor, called ValueGradient Based Policy (VGBP).
The LM algorithm uses the gradient and the Hessian to compute the update step. Therefore the policy gradient and Hessian need to be found. A novel actorcritic method has been devised, called Vanilla ActorCritic (VAC), that efficiently learns the policy gradient. On the inverted pendulum swingup task this algorithm outperformed Natural ActorCritic (NAC). A number of different approaches have been taken to approximate the policy Hessian, but none delivered a proper Hessian estimate. Therefore, no LM actor update method was created.
In VGBP the action is found by optimization of the right hand side of the Bellman equation. VGBP uses the provided reward function and a process model for this optimization. The process model is learned online using local linear regression (LLR). Due to the efficient use of information VGBP shows fast learning on the pendulum and a 2DOF robotic arm.

[PDF]
[Abstract]

14 

Imitation learning for a robotic precision placement task
In industrial environments robots are used for various tasks. At this moment it is not feasible for companies to deploy robots for productions with a limited batch size or for products with large variations. The use of robots for such environments can become feasible through a new generation of robots and software which can adapt quickly to new situations and learn from their mistakes while being programmable without needing an expert.
A concept that can enable the transition to flexible robotics is the combination of imitation learning and reinforcement learning. The purpose of imitation learning is to learn a task by generalizing from observations. The power of imitation learning is that the robot is programmed in an intuitive way while the insight of the teacher is incorporated in the execution of the task.
This research studies the combination of imitation and reinforcement learning, the research is applied to an industrial usecase. The research question of this study is: "Can imitation learning be combined with reinforcement learning to achieve a successful application in an industrial robotic precision placement task?"
To imitate the demonstrated trajectories, Dynamic Movement Primitives (DMPs) are used. The DMPs are used to encode the observed trajectories. DMPs can be seen as a springdamper like system with a nonlinear forcing term. The forcing term is a sum of Gaussian basis functions with each its corresponding weight. Reinforcement learning can be applied to these weights to alter the shape of the trajectory created by a DMP. Policy Gradients with Parameter based Exploration (PGPE) is used as reinforcement learning algorithm to optimize the recorded trajectories.
Experiments done on a UR5 show that without the learning step, the DMPs are able to provide a trajectory that results in a successful execution of a robotic precision placement task. The experiments also show that the learning algorithm is not able to remove noise from a demonstrated trajectory or complete a partial demonstrated trajectory. Therefore it can be concluded that the PGPE algorithm is not suited for reinforcement learning in robotics in its current form. It is therefore recommended to apply a dataefficient version of the PGPE algorithm in order to achieve better learning results.

[PDF]
[Abstract]

15 

RealTime Optimistic Planning for the Control of Nonlinear Systems
Optimistic Planning is a modelbased online planning algorithm that guarantees nearoptimal actions for the control arbitrarily nonlinear systems. Planning algorithms aim to find optimal actions by starting from the current state and developing a tree representation of sequences of actions and resulting states, using a model to simulate statetransitions. Typically, online planning algorithms return a sequence of actions, apply the first action (or several actions at the start of the sequence) and start planning again from the new state, resembling the receding horizon principle as seen in Model Predictive Control.
Several optimistic planning algorithms exist, of which in this work only Optimistic Planning for Deterministic systems (OPD) is considered. OPD works for large, possibly infinite state spaces, but only for finite, discrete action spaces. Unfortunately, while OPD shows good theoretical nearoptimality guarantees, there is no record yet of OPD being applied to control nonlinear physical systems in realtime. This is because of the (long) computation required by OPD.
This work analyzes two main methods that can be used to make OPD suitable for realtime applications. The first approach is to increase the computational speed of the planning process by parallelizing the algorithm. Unfortunately, while parallelization has been proven to be able to increase the computational speed in classical planning, in experiments no improvement is found yet for OPD using parallelization. However, a potential benefit from creating a parallel version of OPD is not ruled out and it is expected that more research and more efficient implementations could still lead to an increase in the computational speed.
The second approach is to apply sequences of actions instead of single actions, which increases the time available for the planning process. Replanning starts immediately after a sequence is returned, using as initial state a prediction of the state at the end of the previous sequence. The resulting algorithm is called RealTime Optimistic Planning with Action Sequences (RTOPS). Extensive analysis is performed to find restrictions on the parameters of the algorithm that, when met, can guarantee realtime applicability. Additionally, the effect of using sequences of actions on the performance of the algorithm is investigated and bounds are put on the maximum performance loss.
The performance of RTOPS has been tested in various experiments on different problems: a cartpole simulation, an acrobot simulation and a real inverted pendulum. Different settings are compared and, overall, RTOPS proves to perform well, without violating realtime constraints. The experiments prove that RTOPS allows for the use of optimistic planning for realtime control of physical nonlinear systems.
Future work should focus on applying the ideas used to develop RTOPS to other optimistic planning algorithms, such as those that allow for continuous actions or stochastic systems. Furthermore, a parallelization of RTOPS could be developed that increases its computational speed.

[PDF]
[Abstract]

16 

Distributed Estimation and Control for Robotic Networks
Mobile robots that communicate and cooperate to achieve a common task have been the subject of an increasing research interest in recent years. These possibly heterogeneous groups of robots communicate locally via a communication network and therefore are usually referred to as robotic networks. Their potential applications are diverse and encompass monitoring, exploration, search and rescue, and disaster relief. From a research standpoint, in this thesis we consider specific aspects related to the foundations of robotic network algorithmic development: distributed estimation, control, and optimization.
The word “distributed” refers to situations in which the cooperating robots have a limited, local knowledge of the environment and of the group, as opposed to a “centralized” scenario, where all the robots have access to the complete information. The typical challenge in distributed systems is to achieve similar results (in terms of performance of the estimation, control, or optimization task) with respect to a centralized system without extensive communication among the cooperating robots.
In this thesis we develop effective distributed estimation, control, and optimization algorithms tailored to the distributed nature of robotic networks. These algorithms strive for limiting the local communication among the mobile robots, in order to be applicable in practical situations. In particular, we focus on issues related to nonlinearities of the dynamical model of the robots and their sensors, to the connectivity of the communication graph through which the robots interact, and to fast feasible solutions for the common (estimation or control) objective.

[PDF]
[Abstract]

17 

Ant Dispersion Routing for Traffic Optimization

[PDF]

18 

Vision based robust control of a rotational pendulum

[PDF]

19 

A visionbased motion estimator for a legged robot.

[PDF]

20 

Loop closing in SLAM using ant colony optimization
For service robots to be employed in normal human environments, autonomous navigation is an important requirement. To enable autonomous navigation, the robot needs to be able to build a map of any unknown environment. The problem of letting a robot build a map and simultaneously localizing itself within the same map is known in robotics as the Simultaneous Localization and Mapping (SLAM) problem.
An important problem in SLAM is known as the loop closing problem. This problem occurs when the trajectory of the robot contains a large loop. After traversing this loop, the accumulated pose error causes the algorithm to fail to recognize the robot has returned to its original position.
In this thesis a novel algorithm is introduced to improve the loop closing behaviour of a widely used SLAM algorithm, called FastSLAM. FastSLAM uses a particle filter to estimate the pose of the robot. The resampling step in the particle filter algorithm causes particle depletion, which inhibits correct loop closures. The proposed algorithm, called Ant Colony Optimization (ACO)SLAM, uses ACO to improve the resampling step of the FastSLAM algorithm. ACOSLAM does this by incorporating a measure of the map consistency and a measure for correct loop closures into the resampling step.
Simulations are used to compare the performance of both algorithms with varying noise settings. Robby, the personal robot developed by the Delft Biorobotics lab, is used for experiments to test the performance of the algorithms in a realistic human environment.

[PDF]
[Abstract]
