Concolic Firmware Emulation using Dynamic State Selection

More Info
expand_more

Abstract

In recent years, computers have found their way into nearly every part of life. This led to the creation of many embedded devices, which are usually quite different from the more commonly known computers and each other. The cause of this is the diversity in constraints that are placed on these devices, in terms of size, weight, energy consumption and more. It is important to test and analyse these embedded systems, to avoid malicious actors to gain control of these systems, or for bugs in the code running on these systems to cause problems. In contrast to other computers, embedded systems are often tightly integrated with their environment and are designed to control it. However, there still are major challenges when analysing these systems, just because of their tight integration with the environment. When executing the code out of the environment it was intended for, it often functions differently, if at all. Current solutions attempt to solve this problem in various ways, but they all fail to create a system that is general and flexible. As such, this project set out to find a methodology that solves this problem in an automated way, while ensuring the device's code executes accurately. This thesis presents a methodology to solve this problem in a general and flexible way, using symbolic execution. By using symbolic execution to handle interaction with the environment, the system can realistically and generally emulate embedded devices. Additionally, symbolic states are made concrete after a while, to continue executing concretely. When the system is executing concretely, user-defined hooks can be executed, which can be used to further instrument the process, allowing for great flexibility. Such extensions could be used to attach a debugger to the system, or by redirection output from a serial connection to a file. We conduct an empirical study to find that the system emulates accurately, reaching a nearly identical result to the same code running on real hardware, using four diverse samples. Additionally, we compare the system against Jetset, a system that attempts to solve a similar problem using just symbolic execution, and find that, although our system is considerably slower, it remains accurate. The difference in performance is primarily due to their system being more tightly integrated. Still, the added flexibility our system provides makes it more suitable to extend with further analyses, and thus might make it preferable in some situations. Finally, we provide the source code of an implementation of the system to allow researchers to continue developing and improving this system.