Building Ai/ml Networks With Cisco Silicon One

Servidores servidores

It's evident from the amount of news coverage, articles, blogs, and water cooler stories that artificial intelligence (AI) and machine learning (ML) are changing our society in fundamental ways-and that the industry is evolving quickly to try to keep up with the explosive growth.

Unfortunately, the network that we've used in the past for high-performance computing (HPC) cannot scale to meet the demands of AI/ML. As an industry, we must evolve our thinking and build a scalable and sustainable network for AI/ML.

Today, the industry is fragmented between AI/ML networks built around four unique architectures: InfiniBand, Standard Ethernet, Enhanced Ethernet, and Scheduled Ethernet.

Each technology has its pros and cons, and various tier 1 web scalers view the trade-offs differently. This is why we see the industry moving in many directions simultaneously to meet the rapid large-scale buildouts occurring now.

This reality is at the heart of the value proposition ofCisco Silicon One.

Customers can deploy Cisco Silicon One to power their AI/ML networks and configure the network to use Standard Ethernet, Enhanced Ethernet, or Scheduled Ethernet. As workloads evolve, they can continue to evolve their thinking with Cisco Silicon One's programmable architecture.

Figure 1. Flexibility of Cisco Silicon One

All other silicon architectures on the market lock organizations into a narrow deployment model, forcing customers to make early buying time decisions and limiting their flexibility to evolve.

Cisco Silicon One, however, gives customers the flexibility to program their network into various operational modes and provides best-of-breed characteristics in each mode. Because Cisco Silicon One can enable multiple architectures, customers can focus on the reality of the data and then make data-driven decisions according to their own criteria.

Figure 2. AI/ML network solution space

To help understand the relative merits of each of these technologies, it's important to understand the fundamentals of AI/ML. Like many buzzwords, AI/ML is an oversimplification of many unique technologies, use cases, traffic patterns, and requirements. To simplify the discussion, we'll focus on two aspects: training clusters and inference clusters.

Training clusters are designed to create a model using known data. These clusters train the model. This is an incredibly complex iterative algorithm that is run across a massive number of GPUs and can run for many months to generate a new model.

Inference clusters, meanwhile, take a trained model to analyze unknown data and infer the answer. Simply put, these clusters infer what the unknown data is with an already trained model. Inference clusters are much smaller computational models. When we interact with OpenAI's ChatGPT, or Google Bard, we are interacting with the inference models. These models are a result of a very significant training of the model with billions or even trillions of parameters over a long period of time.

In this blog, we'll focus on training clusters and analyze how the performance of Standard Ethernet, Enhanced Ethernet, and Scheduled Ethernet behave. I shared further details about this topic in my OCP Global Summit, October 2022 presentation.

AI/ML training networks are built as self-contained, massive back-end networks and have significantly different traffic patterns than traditional front-end networks. These back-end networks are used to carry specialized traffic between specialized endpoints. In the past, they were used for storage interconnect, however, with the advent of remote direct memory access (RDMA) and RDMA over Converged Ethernet (RoCE), a significant portion of storage networks are now built over generic Ethernet.
Today, these back-end networks are being used for HPC and massive AI/ML training clusters. As we saw with storage, we are witnessing a migration away from legacy protocols.

The AI/ML training clusters have unique traffic patterns compared to traditional front-end networks. The GPUs can fully saturate high-bandwidth links as they send the results of their computations to their peers in a data transfer known as the all-to-all collective. At the end of this transfer, a barrier operation ensures that all GPUs are up to date. This creates a synchronization event in the network that causes GPUs to be idled, waiting for the slowest path through the network to complete. The job completion time (JCT) measures the performance of the network to ensure all paths are performing well.

Figure 3. AI/ML computational and notification process

This traffic is non-blocking and results in synchronous, high-bandwidth, long-lived flows. It is vastly different from the data patterns in the front-end network, which are primarily built out of many asynchronous, small-bandwidth, and short-lived flows, with some larger asynchronous long-lived flows for storage. These differences along with the importance of the JCT mean network performance is critical.

To analyze how these networks perform, we created a model of a small training cluster with 256 GPUs, eight top of rack (TOR) switches, and four spine switches. We then used an all-to-all collective to transfer a 64 MB collective size and vary the number of simultaneous jobs running on the network, as well as the amount of network in the speedup.

The results of the study are dramatic.

Unlike HPC, which was designed for a single job, large AI/ML training clusters are designed to run multiple simultaneous jobs, similarly to what happens in web scale data centers today. As the number of jobs increases, the effects of the load balancing scheme used in the network become more apparent. With 16 jobs running across the 256 GPUs, a Scheduled Ethernet results in a 1.9x quicker JCT.

Figure 4. Job completion time for Ethernet versus fully scheduled fabric

Studying the data another way, if we monitor the amount of priority flow control (PFC) sent from the network to the GPU, we see that 5% of the GPUs slow down the remaining 95% of the GPUs. In comparison, a Scheduled Ethernet provides fully non-blocking performance, and the network never pauses the GPU.

Figure 5. Network to GPU flow control for Ethernet versus fully scheduled fabric with 1.33x speedup

This means that for the same network, you can connect twice as many GPUs for the same size network with Scheduled Ethernet. The goal of Enhanced Ethernet is to improve the performance of Standard Ethernet by signaling congestion and improving load balancing decisions.

As I mentioned earlier, the relative merits of various technologies vary by each customer and are likely not constant over time. I believe Standard Ethernet, or Enhanced Ethernet, although lower performance than Scheduled Ethernet, are an incredibly valuable technology and will be deployed widely in AI/ML networks.

So why would customers choose one technology over the other?

Customers who want to enjoy the heavy investment, open standards, and favorable cost-bandwidth dynamics of Ethernet should deploy Scheduled Ethernet for AI/ML networks. Scheduled Ethernet is also great for customers who want to save cost and power by removing network elements, yet still achieve the same performance as Ethernet, with 2x more compute for the same network.
They can improve the performance by investing in telemetry and minimizing network load through careful placement of AI jobs on the infrastructure.

Customers who want to enjoy the full non-blocking performance of an ingress virtual output queue (VOQ), fully scheduled, spray and re-order fabric, resulting in an impressive 1.9x better job completion time, should deploy Scheduled Ethernet for AI/ML networks. Scheduled Ethernet is also great for customers who want to save cost and power by removing network elements, yet still achieve the same performance as Ethernet, with 2x more compute for the same network.

Cisco Silicon One is uniquely positioned to provide a solution for either of these customers with a converged architecture and industry-leading performance.

Figure 6. Evolve your network with Cisco Silicon One

Cisco Price, Dell Price, Huawei Price, ZTE HPE Fortinet Switch Router Server At Low Price

Servidores servidores

Noticias calientes

S5735-L48LP4S-A-V2 Powers Smarter Campus Networks with Advanced PoE and Cloud Management

S5735-L24T4X-A1 Empowers Installers with Scalable, Reliable, and Efficient Network Access

Best Ethernet Switches for Business (2025): Selection Guide and Top Picks

Huawei S5735-L24T4S-A1: A Compact, Stackable Access Switch Built for the Future

Huawei S5735-L24T4S-A: High-Performance Stacking Meets Zero-Noise Deployment

S5735-L24P4XE-A-V2: Huawei’s Smart Choice for High-Density Campus Deployments

S5735-L24P4X-A1: Huawei’s High-Performance Access Switch Redefining Campus Networking

Huawei S5735-L24P4S-A1 Review: Reliable Gigabit Access with Enterprise-Grade Features

What Is an Orthogonal Architecture?

Huawei s5735-l24p4s-a-v2 Delivers Scalable, Secure, and Smart PoE Access for Modern IT Infrastructures

Huawei S5735-L48T4XE-A-V2 Switch Delivers Enterprise-Grade Performance in a Compact Design

Huawei S5735-L48P4XE-A-V2 Review: Versatile Campus Switch with iStack and Full L3 Support

Differences Between Huawei CE Series and S Series Switches

Huawei CloudEngine S5735 Switches Set the Benchmark for High-Performance, Energy-Efficient Switching

Huawei CloudEngine S5731‑S48P4X Datasheet

Huawei CloudEngine S5731‑S24P4X Datasheet

Huawei S5731-S Empowers Next-Generation Campus Networks with Advanced Capabilities

Huawei S5731-H24P4XC Switch Review: Power-Packed Performance and Smart PoE

Huawei S5731-H Series Switches Redefine Campus Networking with Intelligent High-Performance Architecture

Top Features of the Huawei S5731-S24T4X: The Ultimate Gigabit Access Switch for Modern Networks

General Power Module Fault Location Procedure (CE8800 & 7800 & 6800 & 5800)

How Do I Split a Stack? How to clear the stacking configuration?

Huawei CloudEngine S5731 Datasheet

Huawei CloudEngine S5731-S24P4X: Powerful Enterprise-Grade Switch Explained

Huawei S5731-S48T4X Review: Powerful Enterprise Switch for High-Speed Networking

Why are network cables limited to 100 meters?

Huawei S5731-S32ST4X: Powerful, Enterprise-Ready Gigabit Switch with Advanced Capabilities

Huawei S5731-H48T4XC Review: High-Performance Switching for Modern IT Infrastructures

Huawei S5731-H48P4XC: Comprehensive Overview

Common display Commands for Huawei Devices

Building AI/ML Networks with Cisco Silicon One

Learn more:

Read:AI/ML white paper

Visit: Cisco Silicon One

Etiquetas calientes: Artificial Intelligence (AI) Cisco Silicon One Cisco AI/ML ML SP360

Ordering Guide

Recursos recursos

Sobre nosotros

Huawei CloudEngine S5731‑S48P4X Datasheet