Hyperconverged Infrastructure (HCI) is a popular topic so it's no surprise that the announcement of HyperFlex 3.5 to kick off Cisco Live created a lot of buzz throughout the show. The release brings many exciting enhancements to the HX platform as highlighted in Vijay Venugopal's blog titled "HyperFlex 3.5 Takes the Stage at Cisco Live 2018 -Make Your Data Center Rock!", and was the topic of many conversations I had during the week-long show. To me though, the HX220c M5 All NVMe node is the most exciting aspect of the announcement because it represents a significant step for the hyperconverged market and especially for Cisco, being the first to bring a fully engineered all NVMe node to market. NVMe drive technology is a performance game changer, so in this blog, I wanted to discuss what it took to bring this technology to hyperconverged infrastructure.
NVMe drives have shown to yield impressive performance gains over SATA and SAS SSD's, which has driven adoption into the server market, so it begs the question -
Why hasn't this already been done in a hyperconverged solution?
Well, in simplest terms it's because it involved much more than a drive qualification. SATA and SAS based SSD's require dramatically different server designs than NVMe drives, so bringing an all NVMe node to market required platform level hardware and software optimization. Some like to define HCI solutions solely by the software powering them, but the truth is, what truly defines a solution is the outcomes it provides to the end users. While software only models offer customers more choice when it comes to their hardware, it also leaves engineering gaps. This is why software-only vendors are lagging behind in delivering anything more than supporting NVMe drives on their hardware compatibility lists (HCL). Cisco is able to be the first to market with a complete product because we own the entire HyperFlex solution stack from the node hardware, the HCI software, as well as the networking. Taking this fully engineered approach to HCI is what enabled Cisco to address key areas like reliability, availability, and serviceability (RAS) challenges that NVMe architecture introduces. It's also why Cisco was able to leverage our tight partnership with Intel to not just use Intel SSDs, but incorporate other Intel innovations like Intel Volume Management Device (VMD). We used the Intel VMD as a key ingredient to overcome RAS challenges like surprise drive removal errors and firmware management, as well as enable features like LED status lights on the drives, and even hot pluggable NVMe drives.
HCI is a shared infrastructure, so its performance and customer outcomes delivered are determined by all of the components in the cluster, not just the software powering the node. It's the sum of all of its parts which includes the software, hardware, and networking. At the node level, NVMe drives not only enhance storage performance, they drive other measurable platform level enhances like CPU performance and workload density to maximize TCO benefits. Intel's James Myers published a great blog titled "Cisco HyperFlex*, Intel? Xeon? Scalable processors and Intel? Optane? SSDs: Co-innovating a new class of performance in Hyperconverged Infrastructure" when the HX220c All NVMe node was announced and I would highly suggest giving it a read. Overall performance needs to be looked at by taking a step back from that, however. HCI is no different than any infrastructure in that increasing the performance in one area by incorporating new technology will not yield overall results without addressing the bottlenecks that will hold back the performance gains on it.
The network has always been a commonly overlooked priority in HCI solutions by relying on commodity grade 10GbE or even 1GbE switches designed for north-south traffic flow to handle the network traffic across the cluster. Clustered solutions like HCI require much higher east-west bandwidth than traditional 3-tiered architecture, so utilizing networking equipment not optimized for that, limits both performance and functionality. Cisco recognized the vital role the network plays in HCI from the start and engineered HyperFlex to be the first HCI solution with truly integrated networking optimized for clustered environments. Utilizing UCS Fabric Interconnects (FI's), HyperFlex's networking provides high bandwidth for the east-west traffic flow needed to support a high performance clustered environment. After all, if the network is slow, drive performance won't matter, so HX220c All NVMe clusters will include 40G FI's as standard equipment. Software only HCI vendors have started to bundle faster switches into their offerings to try to keep up with performance requirements, but that can introduce further challenges. It's important to point out that simply bundling 25G or 40G switches may help cluster performance, but can also cause inter-operability challenges with a prospective customers' existing networks. UCS FI's are designed to easily incorporate into existing environments through 40GbE or even 10GbE connection to the central network to eliminate interoperability challenges.
So what's the future of all NVMe? Simple -increase workload density, accelerate workload performance, consolidate more, and continue to drive down TCO. The good news for those of you that have been anxiously waiting on an all NVMe solution and are ready to buy, is that the HX220c All NVMe node is built with investment protection in mind. In the future, you can expect to see support higher density NVMe drives, as well as continued software optimizations to both streamline the I/O path to further boost performance as well as add support 2U racks for accelerators and support persistent memory. All of that will be supported by the current platform though.
Business demands placed on IT departments to deliver resources faster, more efficiently, and more cost-effectively have never been greater, and HCI has established itself as a leading solution to meet those challenges. Since coming to market, HCI has seen an incredible evolution from simplifying the infrastructure to support tier-2 applications to its increasing adoption for use on tier-1 mission-critical enterprise applications. However, business demands continue to increase, and so does the demands placed on the IT infrastructure that supports it. Customers are demanding lower read/write latencies to increase end user productivity and higher workload density to lower costs, and Cisco has listened. The time has come to usher in a new era of HCI performance and usability with the HX220C All NVMe.