Linux Network Observability: Building Blocks
As developers, operators and devops people, we are all hungry for visibility and efficiency in our workflows. As Linux reigns the “Open-Distributed-Virtualized-Software-Driven-Cloud-Era”— understanding what is available within Linux in terms of observability is essential to our jobs and careers.
Linux Community and Ecosystem around Observability
More often than not and depending on the size of the company it’s hard to justify the cost of development of debug and tracing tools unless it’s for a product you are selling. Like any other Linux subsystem, the tracing and observability infrastructure and ecosystem continues to grow and advance due to mass innovation and the sheer advantage of distributed accelerated development. Naturally, bringing native Linux networking to the open networking world makes these technologies readily available for networking.
There are many books and other resources available on Linux system observability today…so this may seem no different. This is a starter blog discussing some of the building blocks that Linux provides for tracing and observability with a focus on networking. This blog is not meant to be an in-depth tutorial on observability infrastructure but a summary of all the subsystems that are available today and constantly being enhanced by the Linux networking community for networking. Cumulus Networks products are built on these technologies and tools, but are also available for others to build on top of Cumulus Linux.
1. Netlink for networking events
Netlink is the protocol behind the core API for Linux networking. Apart from it being an API to configure Linux networking, Netlink’s most important feature is its asynchronous channel for networking events.
And here’s where it can be used for network observability: Every networking subsystem that supports config via netlink, which they all do now including most recent ethtool in latest kernels, also exposes to userspace a way to listen to that subsystems networking events. These are called netlink groups (you will see an example here ).
You can use this asynchronous event bus to monitor link events, control plane, mac learn events and any and every networking event you want to chase!
Iproute2 project maintained by kernel developers is the best netlink based toolset for network config and visibility using netlink (packaged by all Linux distributions). There are also many open source Netlink libraries for programatically using NetLink in your network applications (eg: libnl, python nlmanager to name a few).
Though you can build your own tools using Netlink for observability, there are many already available on any Linux system (and of course Cumulus Linux). `ip monitor` and `bridge monitor` or look for monitor options in any of the tools in the iproute2 networking package.
Apart from the monitor commands, iproute2 also provides commands to lookup a networking table (and these are very handy for tracing and implementing tracing tools):
eg `ip route get` `ip route get fibmatch` , `bridge fdb get` ,`ip neigh get` .
2. Linux perf subsystem
Perf is a Linux kernel based subsystem that provides a framework for all things performance analysis. Apart from hardware level performance counters it covers software tracepoints. These software tracepoints are the ones interesting for network event observability—of course you will use perf for packet performance, system performance, debugging your network protocol performance and other performance problems you will see on your networking box.
Coming back to tracepoints for a bit, tracepoints are static tracing hooks in kernel or user-space. Not only does perf allow you to trace these statically placed kernel tracepoints, it also allows you to dynamically add probes into kernel and user-space functions to trace and observe data as it flows through the code. This is my go-to tool for debugging a networking event.
Kernel networking code has many tracepoints for you to use. Use `perf list` to list all the events and grep for strings like ‘neigh’, ‘fib’, ‘skb’, ‘net’ for networking tracepoints. We have added a few to trace E-VPN control events.
For example this is my go-to perf record line for E-VPN. This gives you visibility into life cycle of neighbour , bridge and vxlan fdb entries:
perf record -e neigh:* -e bridge:* -e vxlan:*
And you can use this one to trace fib lookups in the kernel.
perf record -e fib:* -e fib:*
Perf trace is also a good tool to trace a process or command execution. It is similar to everyone’s favorite strace . Tools like strace, perf trace are useful when you have something failing and you want to know which syscall is failing.
Note that perf also provides a python interface which greatly helps in extending its capabilities and adoption.
3. Function Tracer or ftrace
Though this is called a function tracer, it is a framework for several other tracing. As the ftrace documentation suggests, one of the most common uses of ftrace is event tracing. Similar to perf it allows you to dynamically trace kernel functions and is a very handy tool if you know your way around the networking stack (grepping for common Linux networking strings like net, bonding, bridge is good enough to get your way around using ftrace). There are many tutorials available on ftrace depending on how deep you want to go. There is also trace-cmd.
4. Linux auditing subsystem
Linux kernel auditing subsystem exists to track critical system events. Mostly related to security events but this system can act as a good way to observe and monitor your Linux system. For networking systems it allows you to trace network connections, changes in network configurations and syscalls.
Auditd a userspace daemon works with the kernel for policy management, filtering and streaming records. There are many blogs and documents that cover the details on how to use the audit system. It comes with very handy tools autrace, ausearch to trace and search for audit events. Autrace is similar to strace. Audit subsystem also has a iptables target that allows you to record networking packets as audit records.
Systemd might seem odd here but I rely on systemd tools a lot to chase my networking problems. Journalctl is your friend when you want to look for critical errors from your networking protocol daemons/services.
Eg: Check if your networking service has not already given you hints about something failing: journactl -u networking
Systemd timers can help you setup periodic monitoring of your networking state and tools like systemd-analyze can provide visibility into control plane protocol service dependencies at boot and shutdown for convergence times.
Though eBPF is the most powerful of all the above, its listed last here just because you can use eBPF with all of the observability tools/subsystems above. Just google the name of your favorite Linux observability technology or tool with eBPF and you will find an answer :).
Linux kernel eBPF has brought programmability to all Linux kernel subsystems, and tracing and observability is no exception. There are several blogs, books and other resources available to get started with eBPF. The Linux eBPF maintainers have done an awesome job bringing eBPF mainstream (with Documentation, libraries, tutorials, helpers, examples etc).
All the events described in previous sections on tracing and observing networking events, eBPF takes it one step further by providing the ability to dynamically insert code at these trace events.
eBPF is also used for container network observability and policy enforcement (Cilium project is a great example).
A reminder in closing
All these tools are available at your disposal on Cumulus Linux by default or at Cumulus and Debian repos!
Source:: Cumulus Networks