Skip to content

Choosing a Development Environment for NVIDIA BlueField DPU Applications

NVIDIA DOCA libraries simplify the development process of BlueField DPU applications

This post describes different ways to compile an application using various development environments for the BlueField DPU.

Step-A 

Step-B 

Go get a cup of coffee… 

Step-C 

How often have you seen “Go get a coffee” in the instructions? As a developer, I found early on that this pesky quip is the bane of my life. Context switches, no matter the duration, are a high cost to pay in the application development cycle. Of all the steps that require you to step away, waiting for an application to compile is the hardest to shake off. 

As we all enter the new world of NVIDIA Bluefield DPU application development, it is important to set up the build-step efficiently, to allow you to {code => compile => unit-test} seamlessly. In this post, I go over different ways to compile an application for the DPU. 

Free range routing with the DOCA dataplane plugin 

In the DPU application development series, I talked about creating a DOCA dataplane plugin in FRR for offloading policies. FRR’s code count is close to a million lines (789,678 SLOC), which makes it a great candidate for measuring build times.  

Developing directly on the Bluefield DPU 

The DPU has an Arm64 architecture and one quick way to get started on DPU applications is to develop directly on the DPU. This test is with an NVIDIA BlueField2 with 8G RAM and 8xCortex-A72 CPUs. 

I installed the Bluefield boot file (BFB), which provides the Ubuntu 20.04.3 OS image for the DPU. It also includes the libraries for DOCA-1.2 and DPDK-20.11.3. To build an application with the DOCA libraries, I add the DPDK pkgconfig location to the PKG_CONFIG path.

root@dpu-arm:~# export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/opt/mellanox/dpdk/lib/aarch64-linux-gnu/pkgconfig 

Next, I set up my code workspace on the DPU by cloning FRR and switching to the DOCA dataplane plugin branch.

root@dpu-arm:~/code# git clone https://github.com/AnuradhaKaruppiah/frr.git 
root@dpu-arm:~/code# cd frr 
root@dpu-arm:~/code/frr# git checkout dp-doca 

FRR requires a list of constantly evolving prerequisites that are enumerated in the FRR community docs. With those dependencies installed, I configured FRR to include the DPDK and DOCA dataplane plugins.

root@dpu-arm:~/code/frr# ./bootstrap.sh 

root@dpu-arm:~/code/frr# ./configure --build=aarch64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-silent-rules --libdir=${prefix}/lib/aarch64-linux-gnu --libexecdir=${prefix}/lib/aarch64-linux-gnu --disable-maintainer-mode --disable-dependency-tracking --enable-exampledir=/usr/share/doc/frr/examples/ --localstatedir=/var/run/frr --sbindir=/usr/lib/frr --sysconfdir=/etc/frr --with-vtysh-pager=/usr/bin/pager --libdir=/usr/lib/aarch64-linux-gnu/frr --with-moduledir=/usr/lib/aarch64-linux-gnu/frr/modules "LIBTOOLFLAGS=-rpath /usr/lib/aarch64-linux-gnu/frr" --disable-dependency-tracking --disable-dev-build --enable-systemd=yes --enable-rpki --with-libpam --enable-doc --enable-doc-html --enable-snmp --enable-fpm --disable-zeromq --enable-ospfapi --disable-bgp-vnc --enable-multipath=128 --enable-user=root --enable-group=root --enable-vty-group=root --enable-configfile-mask=0640 --enable-logfile-mask=0640 --disable-address-sanitizer --enable-cumulus=yes --enable-datacenter=yes --enable-bfdd=no --enable-sharpd=yes --enable-dp-doca=yes --enable-dp-dpdk=yes 

As I used the DPU as my development environment, I built and installed the FRR binaries in place:

root@dpu-arm:~/code# make –j12 all; make install 

Here’s how the build times fared. I measured that multiple ways:

  • Time to build and install the binaries using make -j12 all and make install
  • Time to build the same binaries but also assemble them into a Debian package using dpkg-buildpackage –j12 –uc –us 

The first method is used for coding and unit testing. The second method of generating debs is needed to compare with build times on other external development environments.

DPU-ARM build Times

Real  

User 

Sys 

DPU Arm  

(Complete make) 

2min 40.529 sec 

16min 29.855 sec 

2min 1.534 sec 

DPU Arm  

(Debian package) 

5min 23.067 sec 

20min 33.614 sec 

2min 49.628sec 

Table 1. DPU-Arm build times

The difference in times is expected. Generating a package involves several additional steps. 

There are some clear advantages to using the DPU as your development environment.

  • You can code, build and install, and then unit-test without leaving your workspace.
  • You can optimize the build for incremental code changes.

The last option is usually a massive reduction in build time compared to a complete build. For example, I modified the DOCA dataplane code in FRR and rebuilt with these results:

root@dpu-arm:~/code/frr# time make –j12 

>>>>>>>>>>>>> snipped make output >>>>>>>>>>>> 

real    0m3.119s 

user   0m2.794s 

sys     0m0.479s 

While that may make things easier, it requires reserving a DPU indefinitely for every developer for the sole purpose of application development or maintenance. Your development environment may also require more memory and horsepower, making this a less viable option long-term. 

Developing on an x86 server 

My Bluefield2 DPU was hosted by an x86-64 Ubuntu 20.04 server, and I used this server for my development environment.

root@server1-x86:~# lscpu |grep "CPU(s):|Model name" 

CPU(s):               32 

Model name:    Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz 

root@server1-x86:~# grep MemTotal /proc/meminfo 

MemTotal:       131906300 kB 

In this case, the build-machine is x86 and the host-machine where the app is going to run is DPU-Arm64. There are several ways to do this:

  • Use an Arm emulation on the x86 build-machine. A DOCA development container is available as a part of the DOCA packages.
  • Use a cross-compilation toolchain. 

In this test, I used the first option as it was the easiest. The second option can give you a different performance but creating that toolchain has its challenges. 

I downloaded and loaded the bfb_builder_doca_ubuntu_20.04 container on my x86 server and fired it up.

root@server1-x86:~# sudo docker load -i bfb_builder_doca_ubuntu_20.04-mlnx-5.4.tar 
root@server1-x86:~# docker run -v ~/code:/code --privileged -it -e container=dock 
er doca_v1.11_bluefield_os_ubuntu_20.04-mlnx-5.4:latest 

The DOCA and DPDK libraries come preinstalled in this container, and I just had to add them to the PKG_CONFIG path.

root@86b87b0ab0c2:/code # export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/opt/mellanox/dpdk/lib/aarch64-linux-gnu/pkgconfig 

I set up the workspace and FRR prerequisites within the container, same as with the previous option.

root@86b87b0ab0c2:/code # git clone https://github.com/AnuradhaKaruppiah/frr.git 
root@86b87b0ab0c2:/code # cd frr 
root@86b87b0ab0c2:/code/frr # git checkout dp-doca 

I could build my application within this DOCA container, but I couldn’t test it in place. So, the FRR binaries had to be built and packaged into debs, which are then copied over to the Bluefield DPU for testing. I set up the FRR Debian rules to match the FRR build configuration used in the previous option and generated the package:

root@86b87b0ab0c2:/code/frr # dpkg-buildpackage –j12 –uc -us 

Table 2 shows how the build time compares with previous methods.

DPU-Arm & X86 Build Times

Real  

User 

Sys 

DPU Arm 

(Complete make) 

2min 40.529sec 

16min 29.855sec 

2min 1.534sec 

DPU Arm 

(Debian package) 

5min 23.067sec 

20min 33.614sec 

2min 49.628sec 

X86 + DOCA dev container 

(Debian package) 

24min 19.051sec 

 

139min 39.286s 

 

3min 58.081sec 

 

Table 2. DPU-Arm and X86 build times

The giant jump in build time surprised me because I have an amply stocked x86 server and no Docker limits. So, it seems throwing CPUs and RAM at a problem doesn’t always help! This performance degradation is because of the cross architecture, as you can see with the next option. 

Developing in an AWS Graviton instance 

Next, I tried building my app natively on Arm but this time on an external server with more horsepower. I used an Amazon EC2 Graviton instance for this purpose with specs comparable to my x86 server. 

  • Arm64 arch, Ubuntu 20.04 OS
  • 128G RAM 
  • 32 vCPUs 
root@ip-172-31-28-243:~#  lscpu |grep "CPU(s):|Model name" 
CPU(s):              32 
Model name:   Neoverse-N1 
root@ip-172-31-28-243:~# grep MemTotal /proc/meminfo 
MemTotal:       129051172 kB 

To set up the DOCA and DPDK libraries in this instance, I installed the DOCA SDK repo meta package.

root@ip-172-31-28-243:~#  dpkg -i doca-repo-aarch64-ubuntu2004-local_1.1.1-1.5.4.2.4.1.3.bf.3.7.1.11866_arm64.deb 
root@ip-172-31-28-243:~#  apt update 
root@ip-172-31-28-243:~# apt install doca-sdk 

The remaining steps for cloning and building the FRR Debian package are the same as the previous option.  

Table 3 shows how the build fared on the AWS Arm instance.

DPU-Arm, X86 & AWS-Arm Build Times

Real  

User 

Sys 

DPU Arm 

(Complete make) 

2min 40.529sec 

16min 29.855sec 

2min 1.534sec 

DPU Arm 

(Debian package) 

5min 23.067sec 

20min 33.614sec 

2min 49.628sec 

X86 + DOCA dev container 

(Generate Debian package) 

24min 19.051sec 

 

139min 39.286sec 

 

3min 58.081sec 

 

AWS-Arm  

(Generate Debian package) 

1min 30.480sec 

 

6min 6.056sec 

0min 35.921sec 

 

Table 3. DPU-Arm, X86 and AWS-Arm build times

 This is a clear winner, no coffee needed.

Figure 1 shows the compile times in these environments.

Figure 1. FRR build times with different options

Summary 

In this post, I discussed several development environments for DPU applications:

  • Bluefield DPU 
  • DOCA dev container on an x86 server
  • AWS Graviton compute instance 

You can prototype your app directly on the DPU, experiment with developing in the x86 DOCA development container, and grab an AWS Graviton instance with DOCA to punch it into hyperspeed! 

For more information, see the following resources:

  • NVIDIA Introduces BlueField DPU as a platform for Zero Trust Security with DOCA 1.2
  • Building a Foundation for Zero Trust Security with NVIDIA DOCA 1.2
  • Introduction to NVIDIA DOCA for BlueField DPUs DLI course
  • DPU-Based Hardware Acceleration: A Software Perspective

Source:: NVIDIA