Active Perception in Autonomous Fruit Harvesting

Viewpoint Optimization with Deep Reinforcement Learning

Master thesis (2021)

Authors

D. Klaoudatos Mechanical Engineering

Contributors

R. Babuska Learning & Autonomous Control - Mechanical, Maritime and Materials Engineering (supervisor 1)

Raf Van de Plas Team Raf Van de Plas - Mechanical, Maritime and Materials Engineering (supervisor 2)

P. Mohajerin Esfahani Team Tamas Keviczky - Mechanical, Maritime and Materials Engineering (supervisor 2)

Gert Kootstra Wageningen University & Research (supervisor 2)

Faculty

Mechanical Engineering

Convolutional Neural Networks (CNNs) Deep Reinforcement Learning Agriculture Robotics Harvesting Robot Viewpoint Optimization Occlusion Modelling Robot Operating System

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:b1c3e2d9-514c-409e-bb64-734242e0d9ad

Published Date

31-03-2021

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Mechanical Engineering

Abstract

This MSc thesis presents the development of a viewpoint optimization framework to face the problem of detecting occluded fruits in autonomous harvesting. A Deep Reinforcement Learning (DRL) algorithm is developed in order to train a robotic manipulator to navigate to occlusion-free viewpoints of the tomato-target. Two Convolutional Neural Networks (CNN), You Only Look Once (YOLO) version 3 and Mask Regional Convolutional Neural Network (Mask R-CNN), are trained and evaluated in order to obtain visual information from the tomatoes. The two trained CNN achieve high detection accuracy and surpass other fruit detection methods. The instance segmentation Mask R-CNN result and an image processing algorithm are applied in an occlusion modelling method. The vision-based reward formulation for the DRL algorithm is closely related to the occlusion modelling metric. The DRL algorithm is trained and evaluated in a simulation environment by mounting a camera on a robotic arm. The use of different reward scheme and exploration strategy during DRL training shows their effect on training performance. The DRL training performance is assessed by illustrating the maximum visible tomato-target fraction per episode, the steps needed per episode, the trajectory of the robot's end-effector, and the initial and final tomato-target viewpoint in each episode. The evaluation results show a satisfactory performance, which depends on the reward and exploration strategy, as in the majority of the cases the robot can fully see the tomato-target after a few steps.

Files

Dimitrios_Klaoudatos_MSc_Thesi... (.pdf)

(.pdf | 14.7 Mb)