1 

Simulation of an Artificial Respiratory System: Choosing a New Actuator for Implementation in a Lung Simulator
It is suspected that instability problems in the current generation of lung simulators are caused by its actuator, a brushless DC motor, in combination with the system configuration. The hypothesis is that these problems can be resolved by replacing the actuator with a backdrivable actuator (that is, an actuator that responds well to external force) in a new system.
In this BSc Thesis this hypothesis is researched. The backdrivable actuator (in this particular case, a Voice Coil actuator) in a new system can overcome the instability problems.

[PDF]
[Abstract]

2 

Featherweight Camera stabilisatie systeem: Gebruiksonderzoek, ontwerp en prototype
De professionele paraglider en fotograaf Leo Westerkamp is instaat de kijker dichter bij de natuur en sport te brengen dan ieder ander log en niet wendbaar apparaat. Echter door bochten, windvlagen, luchtdrukverschillen en trillingen van de motor ondervinden de beelden een ernstig stabiliteitsprobleem. Vandaar dan ook dat Featherweight de opdracht heeft gekregen om de camerabeelden zodanig te stabiliseren dat de beelden te verkopen zijn.
Tijdens het ontwerpproces is er hoofdzakelijk rekening gehouden met de mogelijkheid de camerastabilisator lichtgewicht en makkelijk bestuurbaar te maken. Aan de andere kant mocht hierbij niet worden ingeleverd op de kwaliteit van de beelden.
Ontwerp
Om te voldoen aan de wensen van Westerkamp heeft Featherweight gekozen voor de Featherweight Professional. De Featherweight Professional bestaat uit een opzet waar de camera gestabiliseerd wordt door twee gekoppelde dc motoren. Westerkamp heeft zelf de controle om de camera ergens anders op te richten via een duimcontroller, en via de aangeraden LCDbril is hij instaat zelf realtime te kijken waar hij de camera op heeft gefocust. De motoren worden aangestuurd door een laptop, wiens data verkregen wordt door een microcontroller (ArduinoMega) en de positie van de duimcontroller. De ArduinoMega is er om de data van de acceleratiemeter en gyrometer om te zetten naar de afwijkende hoeken die ontstaan zijn door stabilisatie problemen.
Implementatie
De data van de acceleratiemeter en gyrometer wordt via een I2C protocol naar de ArduinoMega gestuurd. Hier wordt door gebruik te maken van een versimpelde Kalman filter de verkregen data omgezet naar het aantal graden dat de camera afwijkt van het punt waar hij op gestabiliseerd moet worden. De ArduinoMega stuurt het op zijn beurt door naar de laptop met de RS232 interface, waar het wordt verwerkt met MATLAB. Bij MATLAB ligt de nadruk op:
 De hoeken aanpassen door de data van de duimcontroller en de Arduino Mega te combineren;
 De hoeken begrenzen;
 De hoeken en hoeksnelheden via een adaptieve PD regelaar om te zetten naar het gewenste koppel, waar de twee RX64 motoren mee aangestuurd worden.
Daarnaast worden de huidige beelden waar de camera opgericht staat, weergegeven via een LCDscherm.
Evaluatie
Bij de evaluatie kwam naar voren dat veel van de eisen uit het programma van eisen voldaan werden, voor zover deze geïmplementeerd waren in het prototype. Het systeem is licht en compact genoeg en stabiliseert de grootste verstoringen uit het beeld. Daarnaast is het systeem makkelijk te bedienen en niet hinderlijk voor de piloot.
Aanbevelingen
Er wordt aanbevolen, voordat de Featherweight Professional in ontwikkeling wordt genomen, nader onderzoek te doen naar de gebruikerswensen, toepassen van een normale Kalman filter op de sensordata, en het omlaag brengen van de verwerkingstijd.
Ook wordt er aangeraden om mogelijk over te stappen naar een gimbal constructie, het toepassen van een lockfunctie zodat de camera altijd het gewenste object volgt, en het mogelijk toepassen van een softwarematige stabilisatie.
Om Featherweight Professional op een bredere markt te verkopen, moet er onderzoek gedaan worden naar mogelijke klanten. Dit onderzoek kan gedaan worden door een gebruikersonderzoek uit te voeren op een testgroep.

[PDF]
[Abstract]

3 

Nonlinear State and Parameter Estimation for Hopper Dredgers
A Trailing Suction Hopper Dredger (TSHD) is a ship that excavates sediments from the sea bottom while sailing. In situ material is excavated with a special tool called the DragHead, then it is hydraulically transported through a pipe to the hopper where it is temporarily stored. After the dredging is completed the collected material is transported and discharged at a specified location. The efficiency of this process is highly dependent on the detailed knowledge of the excavated soil.
The optimization of dredging operations is of vital importance for future improvement in efficiency, accuracy and from the viewpoint of labor saving. The automated onboard systems that have been developed to optimize the dredging performance require knowledge of several uncertain soildependent parameters. These cannot be directly measured but have to be estimated online from the available measurements. Such estimation is a challenging task due to lack of sufficient sensors, severe nonlinearities in models, and timevarying nature of the parameters of interest.
In this thesis we focus on two of the most important TSHDrelated models. These are:
I. DragHead Model  describing the excavation process,
II. Hopper Model  describing the sedimentation process occurring inside the hopper.
They contain several uncertain soildependent parameters that need to be estimated.
These are:
I. horizontal cutting force coefficient kch (DragHead Model ),
II. ratio kvh between the horizontal and vertical cutting forces (DragHead Model ),
III. in situ permeability ksi (DragHead Model ),
IV. average grain diameter dm (Hopper Model ).
Both processes, together with the corresponding estimation problems, are discussed in detail in Chapter 2.
The highly uncertain and timevarying nature of the soildependent parameters and the nonlinear dynamics of the models used to describe dredging process make the estimation a challenging task. The algorithms that are capable of tackling these type of problems are Nonlinear Bayesian Filters (NBF). In Chapter 3 we review
several types of NBF, namely:
I. parametric filters based on the Taylor series expansion (EKF, IEKF),
II. parametric filters based on statistical approximations (UKF, GHF, CDF),
III. parametric filters based on Gaussian Sum approximations (GSF),
IV. nonparametric filters based on the importance sampling (BPF),
V. nonparametric filters based on the meanfield controloriented approach (FPF).
In Chapter 4 we investigate the applicability of these nonlinear filters to the estimation problems that originate from the DragHead Model. The problems are: the Cutting Estimation Problem and the Cutting and Jetting Estimation Problem. The Cutting Estimation Problem applies for any cutting excavation tool whereas the Cutting and Jetting Estimation Problem is applicable only for tools equipped with cutting and jetting components. The former problem considers estimation of the ratio kvh between cutting forces and the horizontal cutting force coefficient kch, the latter problem deals with the estimation of the horizontal cutting force coefficient kch and the in situ permeability ksi. To solve the aforementioned estimation problems one needs to handle timevarying delay in the measurement of incoming density ρi, which is discussed separately.
It is concluded that among the tested methods the best solution to the Cutting Estimation Problem is provided by the CDF and, in case of large uncertainty in the initial states, by the GSF. To solve the Cutting and Jetting Estimation Problem it is crucial to exploit the correlation between the horizontal cutting force coefficient kch and the in situ permeability ksi. This is done by a cascaded filter, which uses the PF to obtain an estimate of ksi, which will be further filtered by a Steady State Identification (SSI) filter, and finally by the BF to produce a final estimate of kch.
In Chapter 5 we develop a novel class of nonlinear particle filters: the Saturated Particle Filter (SPF) that is used to solve the Hopper Estimation Problem. The SPF is a general method designed for Saturated Stochastic Dynamical Systems (SSDS), which are severely nonlinear systems often used in modeling reallife problems. They are characterized by a constrained probability distribution exhibiting singularity on the boundary of the saturation region. Such singularities make it difficult to estimate the states or the parameters of SSDSs by standard nonlinear filters. Our new method exploits the specific structure of the SSDS in order to design an importance sampling distribution that accounts for the most recent measurements in the prediction step of the filtering algorithm.
Chapter 6 deals with the asymptotic properties of the SPF. We establish the conditions under which the SPF converges to the optimal theoretical filter. The convergence of our method is closely related to the appropriate resampling scheme. This led to the development of the improved Saturated Particle Filter (iSPF) which combines the importance sampling of the SPF with a novel resampling algorithm.
In Chapter 7 the iSPF together with other nonparametric methods from Chapter 3 are used to estimate the average grain diameter dm, which solves the Hopper Estimation Problem. Because the sedimentation process is naturally divided into three regimes, to find the most efficient filtering method we considered each mode
separately. We conclude that:
I. for the NoOverflow loading phase the best estimate of dm is obtained by the FPF,
II. for the Overflow loading phases with weak erosion, the recommended filtering method is the ReducedOrder PF,
III. for the Overflow loading phases with strong erosion, the best estimation performance is achieved by the ReducedOrder PF when the excavated soil is fine and the Hybrid SPF when the excavated soil is coarse.
The final solution to the Hopper Estimation Problem is obtained by integrating the filters designed for separate modes into a global estimator.
Chapter 8 concludes the thesis.

[PDF]
[Abstract]

4 

Shaping Methods to Accelerate Reinforcement Learning: From Easy to Challenging Tasks
Reinforcement learning (RL) is one of the most active research areas in artificial intelligence. In RL an agent tries to maximize the total amount of reward it receives while interacting with an environment. The reward is used to improve the policy. Conventional methods of reinforcement learning perform well for simple tasks, but as the task becomes more complex, these methods fail to converge fast or converge to a suboptimal policy. Hence, new methods of RL are needed that can handle complex tasks.
As humans, we simplify a task that is difficult to learn by first learning simplified versions of the task, before moving back to the original task. This idea of starting from simpler tasks and gradually increasing complexity, until the original task is solved, can also be exploited in RL, where it is called shaping. In order to accelerate learning in the original task, shaping methods transfer the knowledge and experiences from the easy task (source task) to the original task (target task). It is believed that the process of gradually increasing the complexity significantly reduces the difficulty of the learning problem. However, sometimes the total required time to solve the easy task plus the original task is larger than starting from scratch. In order to reduce this time, an essential decision in shaping is when to transfer learning from an easier task to a more difficult one. Transferring too early may mean the controller has not learned enough in the easy task, diminishing the usefulness of shaping. Transferring too late could make the controller waste learning time in the easy task, without making significant progress toward solving the original, complex task. If we switch the task at a proper point then the total learning time will decrease. The first part of this thesis is devoted to a thorough empirical study of the shaping methods. In the next part we try to find a suitable performance index that can be used as a switching criterion to reduce the total time needed to learn a task.

[PDF]
[Abstract]

5 

Memorybased Modeling and Prioritized Sweeping in Reinforcement Learning
Reinforcement Learning (RL) is a popular method in machine learning. In RL, an agent learns a policy by observing statetransitions and receiving feedback in the form of a reward signal. The learning problem can be solved by interaction with the system only, without prior knowledge of that system. However, realtime learning from interaction with the system only, leads to slow learning as every timeinterval can only be used to observe a single statetransition. Learning can by accelerated by using a Dynastyle algorithm. This approach learns from interaction with the real system and a model of that system simultaneously. Our research investigates two aspects of this method: Building a model during learning and implementing this model into the learning algorithm.
We use a memorybased modeling method called Local Linear Regression (LLR) to build a statetransition model during the learning process. It is expected that the quality of the model increases as the number of observed statetransitions increase. To assess the quality of the modeled statetransitions we introduce prediction intervals. We show that LLR is able to model various systems, including a complex humanoid robot.
The LLR model was added to the learning algorithm to generate more statetransitions for the agent to learn from. We show that an increasing number of experiences leads to faster learning. We introduce Prioritized Sweeping (PS) and Look Ahead (LA) Dyna as possibilities to use the model more efficiently. We show how prediction intervals can be used to increase the performance of the various algorithms. The learning algorithms were compared using an inverted pendulum simulation, which had to learn a swingup control task.

[PDF]
[Abstract]

6 

Hardware of the Respiratory Simulator: improving and stabilizing the lung simulator
The advanced technology of today has reduced the need of testing medical equipment on human patients. The invention of the lung simulator has made it possible to test ventilators without putting the condition of test patients in jeopardy. The device also allows medical staff to be trained in a safe and more rapid way.
Several types of lung simulators are available. The digitally controlled variant is the most versatile one for simulations. This type has easily adjustable static and dynamic properties, which, in contrary to traditional passive mechanical simulators, make it possible to simulate different types of breathing.
The available design of a digitally controlled lung simulator, however, has one major drawback. The system uses a motorball screw assembly to drive a piston in the air compartment. One of the properties of this assembly is that the motor shaft motion is negligibly affected (not ‘backdrivable’) by air pressure exerted on the piston by an external source. The result of this is a feedback system that needs to realize the entire dynamic response to pressure changes, by active control of the motor. Because of this, the system can easily become unstable for certain settings of the static and dynamic properties.
The objective of this thesis is to determine what the best alternative actuator is for the lung simulator, to the end of making it more stable than the current version. By improving the existing lung simulator and improving its range of operation, and its performance, it can be made more effective and accurate, which will result in better healthcare.
To reach this goal, firstly the desired requirements for the lung simulator will be obtained from the end user. After this, a literature study about alternative actuators shall be conducted. With the results of the literature study, a ranking of the alternative actuators are determined.
From the literature study eight alternative actuators have come up. These are the moving coil actuator, moving iron actuator, permanent magnet synchronous motor (PMSM), ironless core motor, piezoelectric actuator, pneumatic actuator, hydraulic actuator and wax motor. The most important requirements on which the different actuators are evaluated are backdrivability, force linearity, force, precision, speed and stroke length.
Of the eight alternative actuators, except for the PMSM and the ironless core actuator, all the others have an insufficient score on one or more of the requirements. A comparison of the PMSM and the ironless core actuator shows that the backdrivability of the ironless core actuator is better, and that it has the best overall score.
The conclusion is that the ironless core actuator is the most suitable alternative for the lung simulator. The use of the actuator should increase the stability of the lung simulator. However, no testing was possible because of malfunctioning hardware and time shortage. Therefore no practical confirmation could be obtained about the effects of the new actuator on the system.
It is recommended to put the topranked actuators to the test in a prototype. From the test results it will be possible to confirm whether the instability of the original lung simulator was indeed caused by the actuator, and whether the ironless core motor is indeed the best suitable alternative actuator for the lung simulator.

[PDF]
[Abstract]

7 

Identification and Feedforward Control of a Dropondemand Inkjet Printhead

[PDF]

8 

Cogging compensation in embedded brushless motor control for haptics applications
For the development of a space qualified, 7 degrees of freedom, haptic arm exoskeleton a brushless DC motor with high torque production, power density and efficiency was selected. However, the attraction of the rotor magnets to the stator teeth introduces an additive, position dependent, torque disturbance on the motor shaft called cogging torque. In haptic applications this disturbance impacts the realism of the force reflection and limits control fidelity during masterslave teleoperation. This thesis investigates possible solutions within the constraints of the envisioned application. Due to volume, torque production and efficiency requirements the chosen motor is not to be changed. This rules out the use of motor design based cogging minimization techniques. For this reason only controlbased methods are considered. Two methods to identify the cogging waveform were developed. This data was used to do feedforward compensation using a lookuptable approach and a Fourier series approximation. For comparison, a PID feedback compensation and hybrid approach were also tested. Identification of the cogging torque and testing of compensation methods is done using a custom build measurement setup. A reduction of the RMS cogging of 39% was achieved using the feedforward approach, while the PID feedback loop resulted in a 46% reduction. A combination of these two methods in the hybrid approach resulted in a reduction of 75%.
Thesis done as part of a 'double degree' with Systems & Control at 3mE (DCSC) and Embedded Systems at EWI.

[PDF]
[Abstract]

9 

Control of a macromicro actuated system
TU Delft BioRobotics Lab. developed a new type of robotic arm, called a macromicro actuated system for the inherent safety characteristics in humaninteraction environment. The macromicro actuated system is equipped with a macro actuator and a micro actuator which works together to control one joint angle. The macro actuator is connected to the link through a low stiffness spring, and the micro actuator is directly connected to the link. This was designed successfully to have the inherent safety of the robot arm. However, there remains the challenge of controlling the manipulator to possess the inherent safety with highbandwidth performance and robustness, when the robot arm performs pickandplace work of an unknown mass.
In this paper, first, the macromicro actuated system of the Delft robot arm is introduced. Second, the control challenge to have safety with highbandwidth performance is discussed along with possible control schemes for their successful implementation. In the chosen control scheme, Balancing/Tracking mode, the macro part is controlled to compensate for gravity and the micro part is controlled to realize the desired trajectory. Third, a feedback linearization method with a sliding mode algorithm and an indirect adaptive algorithm is surveyed to decrease nonlinearities of the system and to have robustness. For a fast estimation of an unknown variable and simplicity of a control algorithm, the sliding mode algorithm is applied to the chosen control scheme with the feedback linearization method. Finally, the Balancing/Tracking scheme with the feedback linearization method and the sliding mode algorithm is proposed as the controller of the macromicro actuated s ystem.

[PDF]
[Abstract]

10 

Embedding machine learning into passivity theory: a portHamiltonian approach.
Passivitybased control (PBC) is a control methodology that achieves its control objective by rendering a system passive with respect to a desired storage function. A key feature of PBC is that it exploits structural properties of the system. In this thesis, the PBC of systems endowed with a special structure, called portHamiltonian (PH) systems, has been investigated. The geometric structure of PH systems allows reformulating the PBC problem in terms of solving a generally complex partial differential equation (PDE).
Reinforcement learning (RL), on the other hand, is a learning control method that can solve complex nonlinear (stochastic) control problems without the need for a process model or explicitly solving a set of equations. In RL the controller receives an immediate numerical reward as a function of the process state and possibly control action. The goal is to find an optimal control policy that maximizes the cumulative longterm rewards, which corresponds to maximizing a value function. In this thesis, actorcritic techniques have been used, which are a class of RL methods in which a separate actor (the control law) and critic (a "memory") function are learned. A disadvantage of RL is that without having a process model it can be slow at learning and computationally expensive.
In this thesis, the goal was to design a theoretical framework using PBC techniques subject to control saturation that incorporates knowledge about the PH system and learns (optimal) control policies using actorcritic reinforcement learning. Therefore, actorcritic reinforcement learning methods have been combined with different types of PBC, e.g. EBPBC and IDAPBC. The combination of EBPBC with an actorcritic method, energybalancing actorcritic (EBAC), showed its effectiveness in the pendulum swingup problem, which was used as a benchmark test. The advantages of the method from a PBC perspective are that no PDE has to be explicitly solved, control saturation can be incorporated, the geometric structure of the PH system is preserved, (numerical) stability can be assessed using passivity theory and the learned controllers can be interpreted in terms of energyshaping strategies. From a RL perspective, the learning is speeded up because model knowledge is available.

[PDF]
[Abstract]

11 

RealTime Optimistic Planning for the Control of Nonlinear Systems
Optimistic Planning is a modelbased online planning algorithm that guarantees nearoptimal actions for the control arbitrarily nonlinear systems. Planning algorithms aim to find optimal actions by starting from the current state and developing a tree representation of sequences of actions and resulting states, using a model to simulate statetransitions. Typically, online planning algorithms return a sequence of actions, apply the first action (or several actions at the start of the sequence) and start planning again from the new state, resembling the receding horizon principle as seen in Model Predictive Control.
Several optimistic planning algorithms exist, of which in this work only Optimistic Planning for Deterministic systems (OPD) is considered. OPD works for large, possibly infinite state spaces, but only for finite, discrete action spaces. Unfortunately, while OPD shows good theoretical nearoptimality guarantees, there is no record yet of OPD being applied to control nonlinear physical systems in realtime. This is because of the (long) computation required by OPD.
This work analyzes two main methods that can be used to make OPD suitable for realtime applications. The first approach is to increase the computational speed of the planning process by parallelizing the algorithm. Unfortunately, while parallelization has been proven to be able to increase the computational speed in classical planning, in experiments no improvement is found yet for OPD using parallelization. However, a potential benefit from creating a parallel version of OPD is not ruled out and it is expected that more research and more efficient implementations could still lead to an increase in the computational speed.
The second approach is to apply sequences of actions instead of single actions, which increases the time available for the planning process. Replanning starts immediately after a sequence is returned, using as initial state a prediction of the state at the end of the previous sequence. The resulting algorithm is called RealTime Optimistic Planning with Action Sequences (RTOPS). Extensive analysis is performed to find restrictions on the parameters of the algorithm that, when met, can guarantee realtime applicability. Additionally, the effect of using sequences of actions on the performance of the algorithm is investigated and bounds are put on the maximum performance loss.
The performance of RTOPS has been tested in various experiments on different problems: a cartpole simulation, an acrobot simulation and a real inverted pendulum. Different settings are compared and, overall, RTOPS proves to perform well, without violating realtime constraints. The experiments prove that RTOPS allows for the use of optimistic planning for realtime control of physical nonlinear systems.
Future work should focus on applying the ideas used to develop RTOPS to other optimistic planning algorithms, such as those that allow for continuous actions or stochastic systems. Furthermore, a parallelization of RTOPS could be developed that increases its computational speed.

file embargo until: 20150307
[Abstract]

12 

Learning Parameter Selection in Continuous Reinforcement Learning: Attempting to Reduce Tuning Effords
The reinforcement learning (RL) framework enables to construct controllers that try to find find an optimal control strategy in an unknown environment by trialanderror. After selecting a control action, the controller receives a numerical reward. The reward signal is based on the current state of the environment and the applied control action. The controller aims to maximize the cumulative reward, known as the return. In this thesis actorcritic and criticonly RL algorithms are considered. Actorcritic algorithms consist of an element that selects the actions (the actor) and an element that learns the expectation of the return (the critic). This expectation is captured in a value function. The critic is used to improve the control policy of the actor. Criticonly algorithms select the action by direct optimization over a value function.
Before a RL algorithm can be applied to a control problem, a number of learning parameters need to be set. The optimal values of some of these parameters are highly problem dependent. It is not straightforward how these parameters should be chosen and often these are often determined by trying a large set of parameters. The main focus of thesis is to devise an action selection method that is able to select continuous actions without problem dependent parameters. Two approaches are taken: first, it is investigated if Levenberg Marquardt (LM), a popular optimization method, can be used to determine the actor update step. Second, an action selection method is treated that lacks an explicit actor, called ValueGradient Based Policy (VGBP).
The LM algorithm uses the gradient and the Hessian to compute the update step. Therefore the policy gradient and Hessian need to be found. A novel actorcritic method has been devised, called Vanilla ActorCritic (VAC), that efficiently learns the policy gradient. On the inverted pendulum swingup task this algorithm outperformed Natural ActorCritic (NAC). A number of different approaches have been taken to approximate the policy Hessian, but none delivered a proper Hessian estimate. Therefore, no LM actor update method was created.
In VGBP the action is found by optimization of the right hand side of the Bellman equation. VGBP uses the provided reward function and a process model for this optimization. The process model is learned online using local linear regression (LLR). Due to the efficient use of information VGBP shows fast learning on the pendulum and a 2DOF robotic arm.

[PDF]
[Abstract]

13 

Imitation learning for a robotic precision placement task
In industrial environments robots are used for various tasks. At this moment it is not feasible for companies to deploy robots for productions with a limited batch size or for products with large variations. The use of robots for such environments can become feasible through a new generation of robots and software which can adapt quickly to new situations and learn from their mistakes while being programmable without needing an expert.
A concept that can enable the transition to flexible robotics is the combination of imitation learning and reinforcement learning. The purpose of imitation learning is to learn a task by generalizing from observations. The power of imitation learning is that the robot is programmed in an intuitive way while the insight of the teacher is incorporated in the execution of the task.
This research studies the combination of imitation and reinforcement learning, the research is applied to an industrial usecase. The research question of this study is: "Can imitation learning be combined with reinforcement learning to achieve a successful application in an industrial robotic precision placement task?"
To imitate the demonstrated trajectories, Dynamic Movement Primitives (DMPs) are used. The DMPs are used to encode the observed trajectories. DMPs can be seen as a springdamper like system with a nonlinear forcing term. The forcing term is a sum of Gaussian basis functions with each its corresponding weight. Reinforcement learning can be applied to these weights to alter the shape of the trajectory created by a DMP. Policy Gradients with Parameter based Exploration (PGPE) is used as reinforcement learning algorithm to optimize the recorded trajectories.
Experiments done on a UR5 show that without the learning step, the DMPs are able to provide a trajectory that results in a successful execution of a robotic precision placement task. The experiments also show that the learning algorithm is not able to remove noise from a demonstrated trajectory or complete a partial demonstrated trajectory. Therefore it can be concluded that the PGPE algorithm is not suited for reinforcement learning in robotics in its current form. It is therefore recommended to apply a dataefficient version of the PGPE algorithm in order to achieve better learning results.

[PDF]
[Abstract]

14 

Distributed Estimation and Control for Robotic Networks
Mobile robots that communicate and cooperate to achieve a common task have been the subject of an increasing research interest in recent years. These possibly heterogeneous groups of robots communicate locally via a communication network and therefore are usually referred to as robotic networks. Their potential applications are diverse and encompass monitoring, exploration, search and rescue, and disaster relief. From a research standpoint, in this thesis we consider specific aspects related to the foundations of robotic network algorithmic development: distributed estimation, control, and optimization.
The word “distributed” refers to situations in which the cooperating robots have a limited, local knowledge of the environment and of the group, as opposed to a “centralized” scenario, where all the robots have access to the complete information. The typical challenge in distributed systems is to achieve similar results (in terms of performance of the estimation, control, or optimization task) with respect to a centralized system without extensive communication among the cooperating robots.
In this thesis we develop effective distributed estimation, control, and optimization algorithms tailored to the distributed nature of robotic networks. These algorithms strive for limiting the local communication among the mobile robots, in order to be applicable in practical situations. In particular, we focus on issues related to nonlinearities of the dynamical model of the robots and their sensors, to the connectivity of the communication graph through which the robots interact, and to fast feasible solutions for the common (estimation or control) objective.

[PDF]
[Abstract]

15 

Ant Dispersion Routing for Traffic Optimization

[PDF]

16 

Vision based robust control of a rotational pendulum

[PDF]

17 

A visionbased motion estimator for a legged robot.

[PDF]

18 

Loop closing in SLAM using ant colony optimization
For service robots to be employed in normal human environments, autonomous navigation is an important requirement. To enable autonomous navigation, the robot needs to be able to build a map of any unknown environment. The problem of letting a robot build a map and simultaneously localizing itself within the same map is known in robotics as the Simultaneous Localization and Mapping (SLAM) problem.
An important problem in SLAM is known as the loop closing problem. This problem occurs when the trajectory of the robot contains a large loop. After traversing this loop, the accumulated pose error causes the algorithm to fail to recognize the robot has returned to its original position.
In this thesis a novel algorithm is introduced to improve the loop closing behaviour of a widely used SLAM algorithm, called FastSLAM. FastSLAM uses a particle filter to estimate the pose of the robot. The resampling step in the particle filter algorithm causes particle depletion, which inhibits correct loop closures. The proposed algorithm, called Ant Colony Optimization (ACO)SLAM, uses ACO to improve the resampling step of the FastSLAM algorithm. ACOSLAM does this by incorporating a measure of the map consistency and a measure for correct loop closures into the resampling step.
Simulations are used to compare the performance of both algorithms with varying noise settings. Robby, the personal robot developed by the Delft Biorobotics lab, is used for experiments to test the performance of the algorithms in a realistic human environment.

[PDF]
[Abstract]

19 

SoapParticle Clustering for Nonlinear System Identification
The Soap Particle Clustering algorithm for nonlinear system identification is a novel clustering algorithm that can decompose the operating region of a dynamic system into multiple affine operating premises. The clustering algorithm performs comparable to GustafsonKessel fuzzy clustering for system identification, but without the need for the number of clusters and their shape a priori. The algorithm is inspired by the way soap particles move to the interface of grease and water. The artificial soap particles will move to the boundary of the hull of the measurement data cloud, thereby creating a description of the shape of the data cloud. Within this shape we extract the largest convex shapes, which become the clusters, and these in turn define the affine subsystems.

[PDF]
[Abstract]

20 

Ant Colony Optimization for Control
The very basis of this thesis is the collective behavior of ants in colonies. Ants are an excellent example of how rather simple behavior on a local level can lead to complex behavior on a global level that is beneficial for the individuals. The key in the selforganization of ants is communication through pheromones. When an ant forages for food, it is biased to search along trails of stronger pheromone concentrations. The moment it finds food, it will walk back to the nest while depositing pheromones and thereby contributing to the reinforcement of a successful trail. Inspired by this mechanism, research within an engineering context has led to the development of the field of Ant Colony Optimization (ACO). Specifically developed for efficiently solving combinatorial optimization problems, ACO has been successfully applied to routing in road traffic and Internet networks.
In this thesis, we take the principles behind ACO to the domain of control policy learning. A control policy is a mapping from states to actions and our objective is to develop methods to learn the optimal control policy for a given dynamic system by interacting with it. We call our methods Ant Colony Learning (ACL) and their power lies in the fact that there is a set of ants, from which each ant interacts with the system and influences the other ants through updating pheromone levels associated with the visited stateaction pairs. In experiments involving control problems that have a discrete state space and deterministic state transitions, it turns out that ACL converges quickly to the optimal solution. We also observe that increasing the number of ants in the algorithm results in a decrease of the number of trials required for convergence to the optimal policy. An analytical study of the convergence behavior of ACL reveals that for systems with discrete and noiseless state transitions, the expected policy converges to the optimal policy in the case of using only one ant.
Another major part of this thesis deals with the application of ACL to control problems with continuous state spaces. In order to capture a continuous space with a finite number of elements, we study two ways of partitioning the state space and their incorporation in the ACL framework. In crisp ACL, the state space is partitioned using bins. Each state measurement is assigned to exactly one bin, which leads to the introduction of discretization noise, rendering an originally deterministic system nondeterministic and restricting the performance of the algorithm. We find that a better way of partitioning the state space is by using fuzzy triangular membership functions. The continuous state measurement then belongs to multiple membership functions to a certain degree. With fuzzy partitioning, the continuity of the state variables is preserved and no nondeterminism is introduced. We call this method fuzzy ACL. The developed generalized ACL algorithm unifies both crisp and fuzzy ACL.
The behavior and performance of crisp and fuzzy ACL are further studied using simulation experiments. We study the influence of the local and global pheromone decay rates,
the number of ants, and the density of the state space partitioning grid on the learning performance. Especially, the performance of crisp ACL improves for a small local pheromone decay rate, while fuzzy ACL outperforms crisp ACL over the whole line. In general, crisp ACL is much more sensitive to the choice of the pheromone decay parameters than fuzzy ACL. We find that using more ants leads to faster convergence, but that the number of ants does not need to be extremely large to obtain a satisfactory performance. With regard to the scaling of ACL, crisp ACL reveals a slow, but gradual improvement of the learning for an increasing state space partitioning density. Fuzzy ACL, on the other hand, improves more rapidly and requires fewer ants to learn a better control policy.
Finally, we present a general modeling framework for swarms of moving agents. It turns out that ACL fits within this framework and as such can be unified with other swarm intelligence techniques. In the future, this could result in beneficial integration of elements from other swarm intelligence techniques into ACL, or the other way around.

[PDF]
[Abstract]
