How adding capacity to a network could reduce IT costs

Finding an enterprise looking to spend more on network equipment is like finding gold while digging for worms for bait. Cost management used to be an option in enhancing profits, but these days it’s a mantra. Given this, you can understand my surprise when I heard from some—so far, just a few—enterprises that they are looking at increasing their network capacity, and I was even more surprised to hear why. It wasn’t to support new applications but to lower costs!

How do you buy more capacity to lower costs? The answer lies in the relationship between network capacity and operating expenses (opex), the cost to run the network so it serves its business goals. Network operators have long said that opex eats more of each revenue dollar than capital expenses (capex), and some enterprises have said the same thing. What’s new here isn’t that enterprises are finding that their focus on controlling equipment spending has actually raised opex more than it’s reduced capex, even though that’s true. What they’re saying is that if they spent more to increase the capacity of their network, it could reduce opex a lot, and even reduce capex overall. That would lower network costs, and improve company profits.

Have you heard the phrase “bandwidth economy of scale?” It’s a sophisticated way of saying that the cost per bit to move a lot of bits is less than it is to move a few. In the decades that information technology evolved from punched cards to PCs and mobile devices, we’ve taken advantage of this principle by concentrating traffic from the access edge inward to fast trunks. If we apply this to local-area and data-center network technology, which is the application that involves the most equipment enterprises purchases, it means we start with thin pipes at the edge and move to fat trunks between switches. Old-line LAN planners tell me that people would argue over the best ratio between input speed and trunk speed on a switch; was it 6:1 or 10:1?

What my key bandwidth-saves-money cadre says is that’s all wrong. You get the fattest reasonable pipe you can buy, everywhere. Forget anything less than 1Gbps even at the edge, and go for 10Gbps where the edge device (PC or server) can support it. For trunks, go for 40G to 100G, and look for the price points of even faster trunks to calculate the curve on cost per bit. You also want to use switches with high overall capacity and a lot of interfaces for trunking rather than layers of switching…all to save operations costs, and for two reasons.

Reason one is simple; operations costs are proportional to network complexity, which is proportional to the number of devices. Cut out layers, you cut complexity and opex. One company told me their headquarters LAN remake cut the number of switches to one-third of their prior level and cut their operations cost by over half.

That’s not the end of the benefits, say enterprises, but the second one is more complicated even though it starts out fairly simply.

Higher capacity throughout the network means less congestion. It’s old-think, they say, to assume that if you have faster LAN connections to users and servers, you’ll admit more traffic and congest trunks. “Applications determine traffic,” one CIO pointed out. “The network doesn’t suck data into it at the interface. Applications push it.” Faster connections mean less congestion, which means fewer complaints, and more alternate paths to take without traffic delay and loss, which also reduces complaints. In fact, anything that creates packet loss, outages, even latency, creates complaints, and addressing complaints is a big source of opex. The complexity comes in because network speed impacts user/application quality of experience in multiple ways, ways beyond the obvious congestion impacts.

When a data packet passes through a switch or router, it’s exposed to two things that can delay it. Congestion is one, but the other is “serialization delay.” This complex-sounding term means that you can’t switch a packet if you don’t have it all, and so every data packet is delayed until it’s all received. The length of that delay is determined by the speed of the connection it arrives on, so fast interfaces always offer better latency, and the delay a given packet experiences is the sum of the serialization delay of each interface it passes through.

Application designs, component costs and AI reshape views on network capacity

You might wonder why enterprises are starting to look at this capacity-solves-problems point now, versus years or decades earlier. They say there’s both a demand and supply-side answer.

On the demand side, increased componentization of applications, including the division of component hosting between data center and cloud, has radically increased the complexity of application workflows. Monolithic applications have simple workflows—input, process, output. Componentized ones have to move messages among the components, and each of these movements is supported by network connectivity, so the network is more tightly bound to application availability and performance. Not only that, the complex workflows make it harder to decide what’s wrong and how to fix it. Finally, remember serialization delay? Every component interface adds to it and eats up part of the delay budget intrinsic to all applications.

On the supply side, the cost of network adapters on systems and interfaces on network devices doesn’t increase in a linear way. One network engineer pointed out that the cost per bit of an interface typically falls as speed increases, up to a point, and then starts to rise. Where that curve breaks upward has changed as technologies have improved, so building in extra capacity is more practical today. Ethernet standards have also evolved to better handle multiple paths between switches (this capability is popular with enterprises that favor adding capacity to reduce opex) and different traffic priorities.

Then there’s AI. Interestingly, the majority of the enterprises who are now actively building local networks with bandwidth to burn are also early explorers of in-house hosting of AI. AI in general, and model training in particular, generates a lot of server-to-server traffic, and so congestion and the risk of delay or packet loss is high. Most agree that AI will need lower latency and higher network capacity, particularly during training, and that since the amount and nature of traffic generated by AI is impossible for a user of AI to understand, congestion-related issues would generate all the more complaint calls. AI traffic might also impact other applications. Thus, AI hosting is a good reason to think seriously about adding capacity to the data center network.

Adding capacity to a network, enterprises agree, will surely not increase opex. Some are now saying it will reduce it. What other network change can claim those attributes? If networking is about moving bits, maybe it’s time to simply add more capacity to do that. Is this a trend? Should it be? The leaders in the add-capacity movement think we’ll answer that in 2025.

Source:: Network World