Why redesign for 40 Gigabit Ethernet?
Trevor details why comparing Ethernet technology to other fabrics is especially timely now, as demand for greater I/O performance continues to track up.
Enterprise data centers, cloud computing, High Performance Computing (HPC), and embedded systems are markets all being pushed toward next-generation fabric performance. Increasing data demand in these markets is driving the need for increased I/O performance. The demand is pushing 10-gigabit fabric ports to 40-gigabit and more. With this drive come hardware redesign and an opportunity to reevaluate the incumbent Ethernet technology against competing fabrics. While 10 Gigabit Ethernet (10 GbE) has become ubiquitous in these markets, QDR and FDR InfiniBand, PCI Express Gen2 and Gen3, and RapidIO Gen2 are worthy challengers.
InfiniBand has proven very successful with a wide variety of switching, Host Bus Adapter (HBA), and Network Interface Card (NIC) component, hardware, and software solutions, winning the highest performance systems in these markets. RapidIO, like InfiniBand, successfully exposes the underlying weaknesses of Ethernet technology, and provides lower, more predictable network latency. RapidIO, like InfiniBand, must support the co-existence of Ethernet and RapidIO in the same system. The ability to use native RapidIO networking as if it were Ethernet creates a compelling argument for using RapidIO in greenfield systems. Its ecosystem support of 20 Gbps ports and Ethernet encapsulation support make RapidIO a logical choice when 10 GbE systems must be redesigned for next-generation throughput and performance.
An example of a RapidIO-based compute/server hardware implementation
Large port count non-blocking switch cards have already been developed using RapidIO Gen2 switches from Integrated Device Technology, Inc. (IDT). These switch cards could be deployed in data center/HPC environments. Figure 1 depicts a switch card implementation that might be leveraged for top-of-rack switching support.
The switch card shown in Figure 1, which may be top of rack for example, leverages IDT’s 12 4x-port, 240 Gbps full non-blocking CPS-1848 RapidIO Gen2 switch. Such a switch card could be scaled to greater full non-blocking capacity for a greater number of interconnect ports as needed. Each server/compute node/blade bridges PCIe to RapidIO via IDT’s Tsi721 protocol conversion device. This architecture is highly scalable and could support inter-chassis connectivity more efficiently than PCIe. Figure 1 highlights the use of x86 processors, but the processor choice could just as easily be any other General Processing Unit (GPU)/processor type that supports the PCIe Gen2/1 interface.
One choice for moving from interconnect to server mode is IDT’s Tsi721 PCIe Gen2 to RapidIO Gen2 protocol conversion bridge device. The Tsi721 converts from PCIe to RapidIO and vice versa with full line rate bridging at 20 Gbaud. Using the Tsi721, designers can develop heterogeneous systems that leverage the peer-to-peer networking performance of RapidIO while at the same time using multiprocessor clusters that may only be PCIe enabled. Efficiently transferring large amounts of data without processor involvement can take place using the full line rate block Direct Memory Access (DMA) plus Messaging Engines of the Tsi721. The advantages of RapidIO Type 9 messaging in support of virtualization and higher performance than 10 GbE make cutting down on cabling possible. This example system is used as a reference for discussion through the remainder of this article.
For this example system, messaging between a pair of nodes is supported with two separate concepts: Logical I/O and Messaging. Logical I/O transactions are supported with direct bridging translation, as well as the Block DMA Engines. As do Ethernet-based systems, our example system supports DMA and Remote DMA (RDMA). Messaging is supported by the Messaging Engines. In comparison, Ethernet messaging support takes place through Transport Control Protocol and other protocols. RapidIO supports a single I/O Virtualization (IOV) per channel, which is less than available Ethernet solutions.
Ethernet offers a means of delivering peer-to-peer traffic processor networks, be it chip-to-chip, board-to-board, or between chassis. However, Ethernet evolved from the LANs and WANs, leaving architects the task of finding an efficient way of using Ethernet in embedded systems. The Ethernet fabric’s LAN/WAN background has led to an assumption that a processor at each node is available to terminate the protocol stack. While this arrangement would be reasonable for LANs and WANs, it introduces too much latency and power consumption in real-time embedded systems (including servers).
PCI and PCIe standards offer an alternative; however, they were really designed for monolithic single host processor systems, with the concept of a root complex. Scaling to multiple processors on line cards, over backplanes, with multiple hosts becomes difficult even with non-transparent bridging. The problem can be managed for a small number of endpoints or computing nodes, but the memory mapping becomes difficult very quickly as systems scale in size.
Built from the ground up for multiprocessor peer-to-peer networks, RapidIO attributes include:
- Reliable communication
- Sub-microsecond end-to-end packet delivery
- 100ns switch cut-through latency
- No processor overhead to terminate the protocol
- Support for “any topology” direct interconnect, mesh, star, dual star, and the like
- High performance messaging for transmitting large amounts of data
- Push architecture with the option of every processor in the system having its own memory subsystem
RapidIO has become the leading embedded interconnect, and with its carrier-class serial communication specified for backplane connectivity, it is able to natively support intra-board, inter-board, and cabled chassis-to-chassis connectivity within a room or between rooms.
Subsidiary specifications were developed for Ethernet to better serve the embedded space and extend beyond wide area network and local area network environments. These enhancements targeted toward data center environments collectively defined Data Center Bridging (DCB). The embedded and data center spaces are characterized by lossless transport, improved flow control, and low latency.
Quality of Service (QoS) and flow control comparison
One of the drivers for increased bandwidth in the enterprise data centers and the cloud is the need to combine the storage network, typically running up to 8 Gbps for Fibre Channel, with the inter-server connectivity network, which is typically 1 Gbps Ethernet. These networks have different QoS constraints. Additionally, storage networks must not discard packets. RapidIO-based systems achieve reliable delivery with predictable QoS today.
For applications that require more aggressive and effective QoS, RapidIO offers advanced flow control and data plane capabilities. The RapidIO protocol defines multiple flow control mechanisms at the physical and logical layers. By managing physical-layer flow control at the link layer, short-term congestion events are effectively managed using both receiver- and transmitter-controlled flow control. Longer-term congestion may be controlled at the logical layer using XOFF and XON messages, which enable the receiver to stop the flow of packets when congestion is detected along a particular flow.
Virtual Channels (VCs) support new QoS capabilities. Features here provide reliable or best-effort delivery policies, enhanced link-layer flow control, and end-to-end traffic management. VCs also allow up to 16 million unique virtual streams between any two endpoints.
Ethernet has also improved its flow control story through pursuit of Data Center Bridging (DCB) technology, which allows one end of the link to halt transmission by the other end of the link in order to avoid buffer overflow and resulting packet discard. The simplified routing of packets made possible by Virtual Local Area Network (VLAN) tagging, as well as the prioritization of packets which is part of the VLAN tag, have also done much to improve Ethernet latency and QoS characteristics.
However, there are multiple DCB QoS and flow control limitations compared with RapidIO. See “Comparing Ethernet and RapidIO” (http://opsy.st/gBSePR). For example, Ethernet’s flow control support primarily comes from 802.3x PAUSE support. Even with enhanced flow control mechanisms, overhead in the congestion notification is high, as the notification propagates from the source to the edge of the network, whereas in RapidIO congestion notification is quite fast through control symbol transmission. Ethernet mechanisms are arguably not widely adopted, and some vendors offer proprietary support for limited topologies. RapidIO’s link-layer flow control allows the transmitter to pack the receiver continuously full, which improves scheduling efficiency and thereby overall switching efficiency.
Performance comparison: latency and throughput
The latency of Ethernet switch devices continues to decrease; to the point where industry-leading Ethernet switches now have around 200 ns latency. However, RapidIO switch device latency is under 100 ns and continues to decrease similarly. This trend will continue for both technologies as companies make use of smaller silicon process geometries and higher physical layer speeds. End-to-end packet termination may be longer than 10 µs for Ethernet and under 1µs for RapidIO.
RapidIO offers guaranteed delivery via link-layer error recovery. Link-layer control symbols minimize control loop latency. Embedding control symbols in packets further shrinks latency. Lossless Data Center Ethernet (DCE) still requires offload engines and/or software stacks, which tend to add latency.
RapidIO Gen2 switches offer 20 Gbps per port. The Tsi721 converts data from PCIe to RapidIO, and vice versa, and bridges at the full line-rate of 16 Gbps for packets as small as 64 bytes. This is higher than the generally available 10 GbE, but certainly less than the increasing number of 40 GbE approaches becoming available.
From a raw bandwidth perspective, Ethernet is ahead of RapidIO. However, with the announcement of the RapidIO 10xN roadmap (http://www.compactpci-systems.com/news/id/?27341), the pending availability of those physical specifications, and future plans toward 25xN, RapidIO bandwidth will close the gap with Ethernet. RapidIO performance and protocol efficiency allow robust protocol encapsulation. Messaging and/or data streaming provide native Ethernet encapsulation capabilities.
The Tsi721 features for RapidIO receive-side security are enforced by hardware and can determine whether or not a set of destination IDs will be accepted. Each RapidIO 2.1-defined packet type can be either accepted or discarded. Software must enforce transmission-side security.
The system host enforces switch fabric security. It is not possible for one node to communicate with another unless the fabric routing tables are configured to allow packets to be routed between the nodes. Each switch port also has four filters that can mask and match up to the first 20 bytes of any packet and discard it. This capability can be used to enforce address ranges and destIDs for DMA read/write transactions, as well as to prevent any node other than the system host from querying or configuring the switch fabric.
Power and cost comparison
RapidIO is arguably more power efficient than Ethernet, as the RapidIO physical layer replaces Ethernet’s transport layer protocols to ensure reliable, in-order delivery of messages. Ethernet has a higher protocol overhead, which bumps up the power per datum transmitted.
Ethernet providers charge a premium for 10 Gbps Ethernet ports, and an even greater premium for 40 Gbps ports. Per system port pricing for 10 Gbps Ethernet silicon may be up to hundreds of dollars per port, with per 10 G switch device port costs greater than $10 in volume. Aided by less complex packet termination, smaller packet memories, and no classification requirements, among other factors, system port costs for RapidIO are approximately $55 with switch device costs less than $4 per 10 G port in volume.
From an ecosystem perspective, Ethernet, now more than 30 years old, has a large advantage over the RapidIO ecosystem, which has reached 10 years and counting. Multiple suppliers exist for silicon, platforms, tools, and software. Ethernet’s hardware ecosystem offers converged network cards, switch and router platforms, and server and storage platforms. The software ecosystem offers network management software, Microsoft Windows, Linux, and a wide variety of other offerings. There is also protocol analysis, packet sniffers, traffic generators, network testers, and a strong compliance test story. RapidIO has strong embedded OS, Linux, OpenFabrics Enterprise Distribution (OFED), protocol analysis, system diagnostic, and multiple server platforms. RapidIO has demonstrated strong Gen1 compliance, and Gen2 compliance is under way. However, Windows OS, VMware, network adapter solutions, NPU devices, and storage platform support choices are still limited.
The redesign effort required to move from 10 GbE to 40 GbE opens greenfield opportunities for competing interconnect fabrics to find a place in enterprise servers, data centers, cloud computing, and HPC. Its 20 Gbps throughput makes RapidIO Gen2 a strong contender for these opportunities. System designers who are able to leverage RapidIO’s smaller and less mature ecosystem can move to 20 Gbps with switching, PCIe bridging, and all embedded CPU solutions available today. The benefits include a highly scalable and reliable system. It’s possible to realize the advantages of a fault-tolerant, carrier-class system while at the same time dramatically lowering costs, power, and latency and gaining superior flow control and QoS.