ONIE 2020 update
Even if it’s free, you still have to sell it. Yet a solution only works if people want to use it.
Last year I became the project lead for ONIE: the Open Network Install Environment. If you’re unfamiliar with this, ONIE is an open source project for installing operating systems on network switches. Manufacturers will start with the core ONIE code, add support for their new hardware (so that their new switch has an industry-standard way of installing an operating system) and then submit those changes back to the ONIE project.
As of 2020, this has happened over two hundred times, along with with well over a thousand contributions of bug fixes and improvements. As these changes are submitted, they need to be quality checked and tested to make sure they build cleanly. When I became the project lead, I had already been working on build tools at Cumulus Networks and decided my first contribution was going to be creating a standard build environment for ONIE that could be deployed anywhere.
This went great until my final test of the new build environment, which was to build every platform ONIE supported, and it didn’t go as well as I’d hoped. Older platforms that had previously built using a Debian 8 system wouldn’t build under Debian 9.
While this proved to be a disappointment, it was not at all surprising. As build tools evolve they become better at flagging issues or may move files to new locations, creating problems that code written seven years ago would have never anticipated.
So now my new build environment had to support the following requirements:
Older platform builds should not break.
New platform builds must not be prevented from building with current tools.
Now, this would be easier implemented if requirement two didn’t frequently break requirement one.
Ideally, this would be addressed by updating the code for all platforms and then individually testing them, but this was impeded by a lack of both time and resources in that:
Updating 200+ platform builds to function with the latest tools represents a significant amount of effort, and has a questionable return on investment. Is rebuilding a 4 year old platform with the latest compilers really better than using the ones that were originally used to build it?
Also, I don’t have 200+ platforms to use to test the resulting builds.
I came to the conclusion that I could resolve this by creating build environments that were current for when the code for a particular platform was developed. This way, the code should build and function the way it was always intended to, and new development wouldn’t be hampered.
I decided to use Docker containers to implement this solution as they’re easy to deploy, fairly lightweight and already come in a variety of OS versions. Getting a Debian 8 or Debian 9 image to configure is merely just a download, so at this point, I was thinking all I had to do was “Just use a Dockerfile” to specify the final configuration and that would be it.
But of course it wasn’t that easy, because I did the one thing good software developers always do.
I ate my own dog food. And well, it was awful.
Look, build environments are straightforward: you add the software’s build dependencies and run the build. It’s an easy problem to solve because it’s predictable— it’s just one thing. It’s the same thing, every time. That’s why I thought my logic of “just use a Dockerfile” would work.
On the other hand, development environments are all about working with software that is constantly changing and, as a result, they’re a constantly shifting morass of personal preference, shortcuts, dead ends and things that seemed like a good idea at the time. If build is a one-way street, then development is Boston traffic.
Besides, as a maintainer of build tools, I’ve found container development environments are a tough sell because there are so many, minor, shouldn’t-be-a-big-deal-but-they-are problems that nobody wants to use them. It becomes the death of a thousand cuts because even the most basic tasks come with a “yeah, but.”
For example, try to move a container file to the host system.
Copy it out? Yeah, but you’ll have to scp it to your host. Is scp installed?
Save it locally? Yeah, but you had to explicitly mount a directory. Did you?
Copy it to the directory you actually did remember to mount? Yeah, but you’re root here so the file will be owned by root outside the container. Hope you can become root!
Now you can work around most of these details if you know what you’re doing, but you can’t really expect that of anyone else. They want a solution, not a To-Do list. However, the silver lining is that since these problems are common to developing in containers for any build target, a solution can apply to developing for all build targets— not just to ONIE.
So, development rant aside, I still needed a solution that would both let me build ONIE with different versions of containers, and be something developers would actually want to use. Conveniently, I’ve spent the last five years doing just that for a living, and am working for a company that has a history of supporting open source. So after some legal review, Cumulus Networks agreed to let me open source a build tool to support ONIE.
What I came up with was Dedicated User Environment, or DUE, a build environment for creating build environments. It’s absolutely no coincidence that ONIE is explicitly supported, I’ve been using it for the last two ONIE quarterly releases and I’m pleased with the results.
Apart from hosting recipes to create project-specific build environments, DUE has a container launcher application which mitigates the “death by a thousand cuts” I was just ranting about. The launcher is a much bigger deal than you’d think, and I believe it is the key to making DUE an acceptable solution. So rather than bemoaning the difficulties of developing in a Docker container, here’s a list of things DUE provides that make it the best solution for supporting ONIE build and development.
The supplied templates create known working build container configurations. For example, the list of Debian 8 ONIE build environment packages vary a little from those in Debian 9, but users don’t have to care because I got burned on it once, checked in a fix, and now DUE just takes care of the rest.
The containers solve the historical build environment preservation issue. One host system can easily have containers to build and debug every platform, from any era.
Since DUE always generates the same container, there’s now a standard build environment that can be used by developers anywhere to reproduce issues. This is extremely useful when a project is built all over the globe.
The launcher application supplies run time defaults to the container to do things like:
● Mount the user’s home directory, so .config files and aliases are available.
● Mount the build area, for file transfer in and out of the container.
● Add the invoking user inside the container, so files created by them are owned by them outside the container.
(Which, you may notice, solves the problems I was ranting about earlier.)
There is less typing with the launcher application, which results in fewer steps, fewer mistakes, and the users don’t have to be Docker experts, lowering the bar for tool adoption by maintainers and users.
DUE is written in Bash, so the code is easy to understand and also modify since it looks like commands you’d want to run regardless.
DUE works on shared user systems. Many users can run containers off a single configured image without colliding with each other.
It is build automation-friendly, since container login isn’t required to do a build. Just invoke DUE on a directory and it builds in the specified container’s environment.
Of course, a solution is no good if nobody can locate or use it. Therefore DUE is available on GitHub at https://github.com/ehdoyle/DUE and is open sourced under the MIT license.
The reasons mentioned above are why I’ll be recommending DUE to solve historical build problems at the 2020 OCP Global Summit this March. That’s also why I would like to get the community’s feedback on this plan, good, bad or otherwise. As developers, we all have preferences as to how things should be done, and it would be both rude and ignorant of me to not take those considerations into account. At the end of the day, I’m looking for a solution for ONIE that simply works, and as the project leader, I know I won’t be successful if I’m trying to lead everyone somewhere that nobody wants to go.
Source:: Cumulus Networks