Historically, data center servers have used motherboards that included all key components on a single circuit board. The DC-SCM (Datacenter-ready Secure Control Module) decouples server management and security functions from a traditional server motherboard, enabling development of server management and security solutions independent of server architecture. It also provides opportunities for reducing server printed circuit board (PCB) material cost, and allows unified firmware images to be developed.
Today, Cloudflare is announcing that it has partnered with Lenovo to design a DC-SCM for our next-generation servers. The design specification has been published to the OCP (Open Compute Project) contribution database under the name Project Argus.
A brief introduction to baseboard management controllers
A baseboard management controller (BMC) is a specialized processor that can be found in virtually every server product. It allows remote access to the server through a network connection, and provides a rich set of server management features. Some of the commonly used BMC features include server power management, device discovery, sensor monitoring, remote firmware update, system event logging, and error reporting.
In a typical server design, the BMC resides on the server motherboard, along with other key components such as the processor, memory, CPLD and so on. This was the norm for generations of server products, but that has changed in recent years as motherboards are increasingly optimized for high-speed signal bandwidth, and servers need to support specialized security requirements. This has made it necessary to decouple the BMC and its related components from the server motherboard, and move them to a smaller common form factor module known as the Datacenter Secure Control Module (DC-SCM).
Figure 1 is a picture of a motherboard used on Cloudflare’s previous generation of edge servers. The BMC and its related circuit components are placed on the same printed circuit board as the host CPU.
Figure 1: Previous Generation Server Motherboard
For Cloudflare’s next generation of edge servers, we are partnering with Lenovo to create a DC-SCM based design. On the left-hand side of Figure 2 is the printed circuit board assembly (PCBA) for the Host Processor Module (HPM). It hosts the CPU, the memory slots, and other components required for the operation and features of the server design. But the BMC and its related circuits have been relocated to a separate PCBA, which is the DC-SCM.
Figure 2: Next Generation HPM and DC-SCM
Benefits of DC-SCM based server design
PCB cost reduction
As of today, DDR5 memory runs at 6400MT/s (mega transfers per second). In the future DDR5 speed may even increase to 7200MT/s or 8800MT/s. Meanwhile, PCIe Gen5 is running at 32 GT/s (giga transfers per second), doubling the speed rate of PCIe Gen4. Both DDR5 and PCIE Gen5 are key interfaces for the processors used on our next-generation servers.
The increasing rates of high-speed IO signals and memory buses are pushing the next generation of server motherboard designs to transition from low-loss to ultra-low loss dielectric printed circuit board (PCB) materials, and higher layer counts in the PCB. At the same time, the speed of BMC and its related circuitry are not progressing so quickly. For example, the physical layer interface of ASPEED AST2600 BMC is only at PCIe Gen2 (5 GT/s).
Ultra-low loss dielectric PCB material and higher PCB layer count are both driving factors for higher PCB cost. Another driving factor of PCB cost is the size of the PCB. In a traditional server motherboard design, the size of the server motherboard is larger, since the BMC and its related circuits are placed on the same PCB as the host CPU.
By decoupling the BMC and its related circuitry from the host processor module (HPM), we can reduce the size of the relatively more expensive PCB for the HPM. BMC and its related circuitry can be placed on relatively cheaper PCB, with reduced layer count and lossier PCB dielectric materials. For example, in the design of Cloudflare’s next generation of servers, the server motherboard PCB needs to be 14 or more layers, whereas the BMC and its related components can be easily routed with 8 or 10 layers of PCB. In addition, the dielectric material used on DC-SCM PCB is low-loss dielectric — another cost saver compared to ultra-low loss dielectric materials used on HPM PCB.
Modularized design enables flexibility
DC-SCM modularizes server management and security components into a common add-in card form factor, enabling developers to remove customer specific solutions from the more complex components, such as motherboards, to the DC-SCM. This provides flexibility for developers to offer multiple customer-specific solutions, without the need to redesign multiple motherboards for each solution.
Developers are able to reuse the DC-SCM from a previous generation of server design, if the management and security requirements remain the same. This reduces the overall cost of upgrading to a new generation of servers, and has the potential to reduce e-waste when a server is decommissioned.
Likewise, management and security solution upgrades within a server generation can be carried out separately by modifying or replacing the DC-SCM. The more complex components on the HPM do not need to be redesigned. From a data center perspective, it speeds up the upgrade of management and security hardware across multiple server platforms.
Unified interoperable OpenBMC firmware development
Data center secure control interface (DC-SCI) is a standardized hardware interface between DC-SCM and the Host Processor Module (HPM). It provides a basis for electrical interoperability between different DC-SCM and host processor module (HPM) designs.
This interoperability makes it possible to have a unified firmware image across multiple DC-SCM designs, concentrating development resources on a single firmware rather than an array of them. The publicly-accessible OpenBMC repository provides a perfect platform for firmware developers of different companies to collaborate and develop such unified OpenBMC images. Instead of maintaining a separate BMC firmware image for each platform, we now use a single image that can be applied across multiple server platforms. The device tree specific to each respective server is automatically loaded based on device product information.
Using a unified OpenBMC image significantly simplifies the process of releasing BMC firmware to multiple server platforms. Firmware updates and changes are propagated to all supported platforms in a single firmware release.
The DC-SCM specifications have been driven by the Open Compute Project (OCP) Foundation hardware management workstream, as a way to standardize server management, security, and control features.
Cloudflare has partnered with Lenovo on what we call Project Augus, Cloudflare’s first DC-SCM implementation that fully adheres to the DC-SCM 2.0 specification. In the DC-SCM 2.0 specifications, a few design items are left open for implementers to decide on the most suitable architectural choices. With the goal of improving interoperability of Cloudflare DC-SCM designs across server vendors and server designs, Project Argus includes documentation on implementation details and design decisions on form factor, mechanical locking mechanism, faceplate design, DC-SCI pin out, BMC chip, BMC pinout, Hardware Root of Trust (HWRoT), HWRoT pinout, and minimum bootable device tree.
Figure 3: Project Argus DC-SCM 2.0
At the heart of the Project Argus DC-SCM is the ASPEED AST2600 BMC System on Chip (SoC), which when loaded with a compatible OpenBMC firmware, provides a rich set of common features necessary for remote server management. ASPEED AST1060 is used on Project Argus DC-SCM as the HWRoT solution, providing secure firmware authentication, firmware recovery, and firmware update capability. Project Argus DC-SCM 2.0 uses Lattice MachXO3D CPLD with secure boot and dual boot ability as the DC-SCM CPLD to support a variety of IO interfaces including LTPI, SGPIO, UART and GPIOs.
The mechanical form factor of Project Argus DC-SCM 2.0 is the horizontal External Form Factor (EFF).
Cloudflare and Lenovo have contributed Project Argus Design Specification and reference design files to the OCP contribution database. Below is a detailed list of our contribution:
- SPI, I2C/I3C, UART, LTPI/SGPIO block diagrams
- DC-SCM PCB stackup
- DC-SCM Board placements (TOP and BOTTOM layers)
- DC-SCM schematic PDF file
- DC-SCI pin definition PDF file
- Power sequence PDF file
- DC-SCM bill of materials Excel spreadsheet
- Minimum bootable device tree requirements
- Mechanical Drawings PDF files, including card assembly drawing and interlock rail drawing
The security foundation for our Gen 12 hardware
Cloudflare has been innovating around server design for many years, delivering increased performance per watt and reduced carbon footprints. We are excited to integrate Project Argus DC-SCM 2.0 into our next-generation, Cloudflare Gen 12 servers. Stay tuned for more exciting updates on Cloudflare Gen 12 hardware design!