Binary formats are used in low-level communication between systems. This binary data must be validated for correct structure and allowed content to ensure meaningful data exchange. This is especially important in high-security contexts such as air-gapped networks or classified co
...
Binary formats are used in low-level communication between systems. This binary data must be validated for correct structure and allowed content to ensure meaningful data exchange. This is especially important in high-security contexts such as air-gapped networks or classified communication channels. Such contexts can also be dynamic, requiring a flexible system to support quick changes to the allowed format.
This thesis presents the Binary Verdict Engine, an FPGA-based system that can assess a binary data stream by providing a verdict on the data that it has checked. It supports a wide range of binary formats without having to reconfigure the hardware to change formats. It achieves this through the programmability of its virtual machine architecture. The system executes instructions of a program binary, called a schema program. A language for writing schemas was created to define how the data should adhere to a specific binary format. Furthermore, a custom instruction set architecture was designed, consisting of instructions to traverse and assess data or to update control flow. Assessment consists of two types of assertions on the data. Field assertions exactly match or numerically compare a field of the data to a constant, and length assertions check whether the length of a section is equal to an earlier field specifying that length. The module design of the engine consists of: an input and output system that traverses the binary data per data field, a controller that executes instructions and manages verdict state, an instruction fetcher that provides instructions to the system and manages the instruction pipeline, and a stack for length assertions and control flow utility.
The design is implemented on an FPGA and evaluated for flexibility and performance. Benchmarks range from assessing externally defined flat formats, such as Internet packet headers, to self-describing hierarchical formats, such as ASN.1 DER. The varying use cases across benchmarks show the system's flexibility, which it trades off for a lower performance compared to fully custom FPGA designs. Synthetic benchmarks show a reciprocal decrease in throughput and a linear increase in latency once schemas become more complex, and show flexibility when switching schemas, as downtimes are minimal when switching between them. The system establishes itself as a flexible validation system for diverse and dynamic use cases.