Traditional data networks passively transport data from one node to another and the role of computation within such networks is extremely limited. The discussion on computation in the network devices have existed since 1996 through the active networks architecture and Open Signalling communities. Active Networks allowed the network to perform customized computations on the user data. Open Signalling was a series of workshops entitled with the goal of making ATM, Internet and mobile networks more open, extensible and programmable. Although such discussion existed for a long time, the real hardware to run computation in the network devices were not available. Recently, Programmable Network devices (PNDs) such as SmartNICs and programmable switches allows to run customized computation on the network devices. Nowadays, In-network computing (INC), also known as In-network computation or Netcompute, refers to the execution of customized programs within such PNDs.

In-network computing

In-network computing can be applied in various areas; we specifically use it to offload and monitor RDMA traffic. We discuss different strategies for monitoring RDMA RoCE protocol to implement an accurate and lightweight in-network monitoring in an HPC cloud setting. In spite of significant advancements in network monitoring, the task of efficiently monitoring HPC networks remains a challenging endeavor, primarily due to the unique characteristic of HPC network, which necessitates specialized monitoring tools tailored for their requirements. One of the primary factors contributing to this challenge arises from a specific characteristic exhibited by HPC protocols that bypass the processing unit (CPU) and operating system (OS) of a remote system when accessing data, rendering conventional monitoring approaches ineffective in capturing and analyzing network traffic within the HPC context at an end node. Moreover, because of the high cost associated with HPC devices allocating a dedicated processing unit solely for network monitoring is an inefficient and undesirable solution. These circumstances have compelled us to investigate low-overhead approaches for monitoring HPC networks through the utilization of emerging programmable network devices, effectively leveraging their computing power to analyze network traffic. We established a test environment compromising three machines, each equipped with a BlueField-2 operating on Ubuntu 18.04.6. These machines are interconnected by an NVIDIA SN2100 switch through 100Gbps RoCE links, we evaluated the overhead of our monitoring strategy from augmented network traffic. The network traffic of monitoring depends on the sFlow parameters settings such as sampling rate and sFlow counter-poll-interval.

In-network computing

In-network computing