RoCE based 100GbE RDMA network stack on FPGA hardware

More Info
expand_more

Abstract

Big data analytics is one of the foundations for booming technologies such as machine learning, genetics/genomics, and computer vision. These big data applications require a large amount of data transfers for distributed and parallel processing. Networking is thus a crucial facilitator and could make big impact on big data processing.

In a computing system with a common network stack such as the TCP/IP protocol suite, many expensive memory operations are necessary to process networking traffic. This means a large percentage of CPU resources are occupied by networking rather than data processing. The memory copying overhead introduced by networking not only reduces the throughput but also increases the latency. In this case, networking is becoming a major bottleneck for big data applications. This problem can be solved by applying Remote Direct Memory Access (RDMA) technology to the network stack. RDMA enables a zero-copy mechanism and has CPU bypass ability. With RDMA implemented, both the throughput and latency can be improved.

In this work, we developed an open source 100 Gbps RDMA network stack on Field Programmable Gate Array (FPGA) hardware. The developed stack follows the RDMA over Converged Ethernet (RoCE) architecture and targets the Alveo FPGA platform. The stack includes a User kernel that can be customized for user applications. This means that computing applications can also be offloaded to this RoCE stack. Finally, we evaluate the stack and compare it with existing TCP/IP and RDMA stacks like the EasyNet and StRoM. The results show that the developed RDMA stack achieves a throughput of 100 Gbps and an RDMA READ operation latency around 4 us and an RDMA WRITE latency around 3.5 us for 64B data. It shows a great throughput advantage over the TCP/IP stack for message sizes smaller than 1 MB. The latency is also slightly lower than the TCP/IP stack.Big data analytics is one of the foundations for booming technologies such as machine learning, genetics/genomics, and computer vision. These big data applications require a large amount of data transfers for distributed and parallel processing. Networking is thus a crucial facilitator and could make big impact on big data processing.

In a computing system with a common network stack such as the TCP/IP protocol suite, many expensive memory operations are necessary to process networking traffic. This means a large percentage of CPU resources are occupied by networking rather than data processing. The memory copying overhead introduced by networking not only reduces the throughput but also increases the latency. In this case, networking is becoming a major bottleneck for big data applications. This problem can be solved by applying Remote Direct Memory Access (RDMA) technology to the network stack. RDMA enables a zero-copy mechanism and has CPU bypass ability. With RDMA implemented, both the throughput and latency can be improved.

In this work, we developed an open source 100 Gbps RDMA network stack on Field Programmable Gate Array (FPGA) hardware. The developed stack follows the RDMA over Converged Ethernet (RoCE) architecture and targets the Alveo FPGA platform. The stack includes a User kernel that can be customized for user applications. This means that computing applications can also be offloaded to this RoCE stack. Finally, we evaluate the stack and compare it with existing TCP/IP and RDMA stacks like the EasyNet and StRoM. The results show that the developed RDMA stack achieves a throughput of 100 Gbps and an RDMA READ operation latency around 4 us and an RDMA WRITE latency around 3.5 us for 64B data. It shows a great throughput advantage over the TCP/IP stack for message sizes smaller than 1 MB. The latency is also slightly lower than the TCP/IP stack.