The network infrastructure market is the most dynamic and exciting it has been in decades, and AI is set to make it even more exciting.
AI is a voracious consumer of data, whether it’s fueling the work of large language models (LLMs) in hyperscale clouds or at the edge, where private infrastructure must collect data and securely transmit it to all types of destinations for use in a variety of applications. This means increased demand for network connectivity.
What’s exciting about AI is that it’s not only creating new markets for network infrastructure hardware and software, but also revitalizing traditional networking markets like data center and enterprise due to new demands for data.
This has led to a number of networking players entering new markets that have been relatively static for decades. Cisco has dominated the networking market since the days of the Internet bubble, with an estimated 50-60% market share in the enterprise and data center networking market. This lack of competitive dynamics has made the market very stagnant. However, things have started to change in recent years, with competitors such as Arista Networks gaining share in the cloud hyperscale market. The impending merger of Juniper Networks and HPE also adds a twist that the combination could claim the No. 2 spot in networking. As Juniper strengthens its AI networking roadmap, it becomes a more strategic asset for HPE. At the same time, NVIDIA, the leader in chips for AI infrastructure, has also built its own complete AI-optimized network stack, putting it ahead of networking incumbents in AI workloads in hyperscaler LLMs.
Networking innovation is also plentiful. Startups like Arrcus and DriveNets are tackling AI with a distributed hardware and cloud-scale network operating system (NOS) approach. Hedgehog and Aviz Networks are leveraging cloud tools like open source Software for Open Networking in the Cloud (SONiC) NOS and Kubernetes. And as AI requires more connections to data, we expect to see AI powering multi-cloud networks with notable startups like Alkira, Aryaka, Aviatrix, Graphiant, Itential, and Prosimo.
This is all great for the market. Network buyers now have more choice than ever before. They can choose from a variety of approaches, including complete networking stacks from NVIDIA, the leader in AI infrastructure, best-of-breed networking from established companies like Cisco and HPE/Juniper, and innovative startup solutions.
We’ll talk more about the competition later, but first let’s look at an analysis of why AI networks have different requirements.
Why AI Networking is a New Market
AI applications will likely take many forms, from huge cloud LLMs to other use cases such as small language models (SLMs) used in private clouds for specific vertical applications. For example, AI can be used to train generic chatbots to help with chatting and writing, but it can also be used to develop medicines with customized data or optimize manufacturing sites.
The first thing to understand is that AI networking often comes with different requirements than traditional networking. The transition from general-purpose to accelerated computing requires new software and distributed networking architectures to connect, move, and process data at lightning speeds with very low latency and little tolerance for data loss. This isn’t networking at your local coffee shop.
The race to build huge LLM clouds is also driving demand for specialized processors such as SmartNICs, IPUs, and DPUs that improve the performance of networking, security, and storage capabilities for AI networks. But there are other areas to watch: network players are using different architectures, software, and components to build more economical infrastructure to access AI models, whether at the edge or in the cloud. Whether connecting chips in supercomputers, interconnecting servers in AI clusters, or linking those clusters to the network edge, existing technologies will need to evolve to maintain the performance that AI applications demand.
Futuriom recently conducted an in-depth report on AI networking where they spent several months researching end-user requirements for AI workloads. The market is already starting to fragment, falling into two categories:
1) Training: This is the step where LLMs like ChatGPT, Llama, Claude AI, Mistral, etc. are trained on neural networks by repeatedly running billions of parameters to build systems that recognize words, images, speech, etc. These LLMs are the foundation of AI applications. SLMs also require their own network solutions.
2) Inference: This is the process of adapting an LLM or SLM to process a specific dataset to create an AI application that provides information, solves a specific problem, or completes a task. For example, a bank may adapt Claude AI to run models on anonymized data from multiple transactions to streamline customer service at ATMs. This is often referred to as the “front end” of AI and also requires processing and network capabilities closer to the customer.
Both training and inference require capabilities that are not present in traditional general-purpose client-server networks, or in High Performance Computing (HPC) networks that are based on that paradigm.
Emerging needs include, but are not limited to, higher capacity (scaling to 400Gb/s and 800Gb/s), higher throughput, lower latency, improved reliability, faster access to storage, optimized clustering, and higher utilization of compute.
The competition begins!
As AI continues to capture the business world’s attention for its productivity gains and potential for new digital products, it is no surprise that there is excitement around building AI infrastructure, but the revenue and productivity gains are yet to materialize, and this is likely to be a cycle measured in years, or even decades, that will see changes in business models and architectures.
The AI networking market is estimated to be around 10-15% of total AI infrastructure budgets, and while it will undoubtedly be in the billions, it’s still starting low. Arista Networks CEO Jayshree Ullal has publicly stated that he expects $750 million in networking revenue directly tied to AI builds next year, and this number is expected to grow rapidly.
The AI networking market has been described as Infiniband vs. Ethernet in the past, as NVIDIA focused on Infiniband technology, which has special properties of low latency and losslessness, as an early lead to connect GPUs to networking. However, Ethernet solutions are now coming to market, and NVIDIA also offers Ethernet-based technology in its Spectrum-X platform. AI networking will expand as more Ethernet-based solutions come to market. SLM can run in a variety of business verticals and does not require as much power as LLM. It can also be implemented in private data centers and infrastructure. Ethernet is widely deployed and well understood, and benefits from economies of scale from widely available components.
For this reason, Ethernet is being adapted to meet the needs of AI networks for low latency and lossless communication. In a sense, this means bringing Ethernet closer to Infiniband while still leveraging the economics of Ethernet. A number of vendors have come together to form the Ultra Ethernet Consortium (UEC), which is tasked with introducing upgrades to the Ethernet standard to make it suitable for demanding AI environments, large and small. Ethernet has already been adapted with Remote Direct Memory Access (RDMA) over Converged Ethernet (RoCE) to become more AI-class. And it will evolve even further. Most network vendors support RoCEv2, which adds various enhancements to RoCE, such as Data Center Quantized Congestion Notification (DCQCN), a technology that combines Priority Flow Control (PFC) and Explicit Congestion Notification (ECN), smart queuing, and buffer management. Some vendors are also adding AI and ML to RoCEv2 to improve overall performance.
Open networking also has many benefits, allowing customers to build their own networks by combining vendors’ NOS and hardware. Chipmakers Broadcom, Marvell and Intel offer strong commercial silicon portfolios that allow networking professionals to use off-the-shelf hardware and combine it with the NOS of their choice, such as the open source SONiC.
Large, established networking vendors such as Arista, Cisco, Broadcom, Juniper, HPE, and Nokia have joined the UEC in pursuit of these goals. The merger of Juniper and HPE has received a lot of attention within the group, as the combined networking vendor is expected to become larger and second in market share behind Cisco.
AI networking also offers new opportunities for startups, including vendors with technology rooted in SONiC, such as Aviz Networks and Hedgehog, as well as startups focused on scale-out, distributed systems based on their own NOS, such as Arrcus and Israel-based DriveNets, which already offers hyperscale routing solutions for the telecommunications market.
There are still plenty of vendors to keep an eye on in this burgeoning space. For example, startup Enfabrica offers a compute-to-compute interconnect switch for AI servers that acts as a high-bandwidth “crossbar for NICs” to power compute, network, and memory connections within a cluster. And multicloud networking and Network as a Service (NaaS) vendors like Alkira, Aryaka, Aviatrix, Itential, and Prosimo are making it easier for organizations to build secure network connections to move data to and from AI sources.
The boom in AI networking will also fuel the optical market, which will need high-speed optics to support the bandwidth boom. Here, Ciena, a leader in the optical equipment market, has an opportunity to accelerate data center interconnects with its position in coherent optics. Thailand-based Fabrinet, like rivals Coherent and Lumentum, is a favorite of AI investors as it sees big growth in optics for AI applications. Fiber-optic maker Corning’s shares recently surged 10% after its pre-earnings release in which it raised its second-quarter sales forecast by about $200 million, mainly due to better-than-expected demand for fiber connections in data centers that run AI applications. This is also one area where Cisco is well positioned, with its own optics that it can package on its Silicon One chip platform.
Putting all this together makes for a huge and interesting scrum for AI infrastructure networking leadership, with lots of twists and turns to look forward to. Networking is cool again. Grab the popcorn!