Print Email Facebook Twitter Safe Optimization of Steel Manufacturing with Reinforcement Learning Title Safe Optimization of Steel Manufacturing with Reinforcement Learning Author Kosiorek, Anna (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Spaan, M.T.J. (mentor) Merkestein, Daan (graduation committee) Oliehoek, F.A. (graduation committee) Degree granting institution Delft University of Technology Programme Computer Science | Data Science and Technology Date 2020-08-21 Abstract Steel production is a complex problem, and little has been done to improve it with the usage of Reinforcement Learning techniques. Most studies focus on decomposing it into sub-problems, instead of tacking it as a whole. Research has shown promising results in the area of safe policy improvement on toy problems. These algorithms are not only computationally tractable but also do not compromise the agent's safety concerns during learning. This thesis investigates how they perform on the real-world problem of improving steel production logistics. We take a simulation of a steel plant that uses hand-crafted heuristics for scheduling tasks and model it as a Markov Decision Process. We experiment with safe policy improvement algorithms by using different baseline policies. Given problem suffers from the known ''Curse of dimensionality''. Hence, the algorithms are adjusted to cope with the fast-expanding complexity. The methods prove to learn with fewer amount of samples than exploration methods. The results are especially promising with a highly stochastic baseline policy, as then the agent has a better understanding of the large environment. The next focus is on the factored representation, which has the advantage of better utilizing the problem. However, in our setting, the algorithms become too computationally expensive. Subject Reinforcement LearningSafetySteel manufacturingproduction optimisationsafe reinforcement learning To reference this document use: http://resolver.tudelft.nl/uuid:efbe886c-1b39-4696-9032-3fc1bbe7e445 Part of collection Student theses Document type master thesis Rights © 2020 Anna Kosiorek Files PDF Thesis.pdf 7.07 MB Close viewer /islandora/object/uuid:efbe886c-1b39-4696-9032-3fc1bbe7e445/datastream/OBJ/view