MRC Protocol Deep Dive: The New Paradigm for 100K+ GPU Cluster Networking
OpenAI, together with AMD, Broadcom, Intel, Microsoft, and NVIDIA, has open-sourced the MRC (Multipath Reliable Connection) network protocol through the Open Compute Project. Designed for 100K+ GPU AI training clusters, MRC leverages SRv6 source routing, multipath packet spraying, and multi-plane architecture to compress failover time from seconds to microseconds and flatten the switching hierarchy from 3-4 tiers to 2. Already deployed at Oracle Abilene and Microsoft Fairwater datacenters, MRC signals a shift from general-purpose to purpose-built networking for AI training, with profound implications for network equipment vendors, chipmakers, and cloud providers.