Use Reinforcement Learning to Generate Testing Commands for Onboard Software of Small Satellites

More Info
expand_more

Abstract

Programmers usually write test cases to test onboard software. However, this procedure is time-consuming and needs sufficient prior knowledge. As a result, small satellite developers may not be able to test the software thoroughly.

A promising direction to solve this problem is reinforcement learning (RL) based testing. It searches testing commands to maximise the return, which represents the testing goal. Testers need not specify prior knowledge besides the reward function and hyperparameters. Reinforcement learning has matured in software testing scenarios, such as GUI testing. However, migration from such scenarios to onboard software testing is still challenging because of different environments.

This work is the first research to apply reinforcement learning in real onboard software testing and one of few studies that perform RL-based testing on embedded software without a GUI. In this work, the RL agent observes current code coverage and the interaction history, selects a pre-defined command, or organises a command from pre-defined parameters to maximise cumulative reward. The reward function can be code coverage (coverage testing) or estimated CPU load (stress testing). Three RL algorithms, including the tabular Q-Learning, Double Duelling Deep Q Network (D3QN), and Proximal Policy Optimization (PPO), are compared with a random testing baseline and a genetic algorithm baseline in the experiments.

This study also performs regression testing with a trained RL agent, i.e., to test a version of onboard software that it has never seen before. To do that, the agent processes graph input with code coverage information. The graph is extracted from the onboard software source code via static code analysis. The work tries two graph neural network architectures (GGNN and GAT) with several graph pooling mechanisms to process the graph input.

Apart from the test command generation algorithms, some middleware is also implemented, including a command/response parser, a state identification module, a branch coverage collection tool, and a tool to extract the graph representation and node features. During onboard software testing, the onboard computer (OBC) or the electrical group support equipment (EGSE) can be the master of the bus. The command generation algorithms can run on a lab PC or a cloud server.

The research reveals the advantages and drawbacks of using reinforcement learning to test onboard software. On the one hand, RL-based testing performs well in non-deterministic environments (e.g., stress testing) and regression testing. On the other hand, more straightforward methods like random testing and the genetic algorithm are more useful in deterministic environments.

This document also introduces relative background knowledge. It leaves many recommendations for future work, such as improving sampling efficiency, generalization, and learning a model for fault detection in satellite operation.