Markdown Version | Session Recording
Session Date/Time: 25 Jul 2025 07:30
rift
Summary
The IETF rift working group meeting covered several topics including the publication of RFC 9692, updates on the KV store and segment routing, discussion around multicast in AI/ML environments, and a proposal for dragonfly routing. The discussion highlighted the increasing relevance of rift in data center networking.
Key Discussion Points
- RFC 9692 Publication: RFC 9692, the rift Protocol Spot, has been published.
- KV Store: The KV store document has completed Shepard Review and will be put to last call after this IETF meeting. IPR review is a to-do item for the chairs.
- Segment Routing: Continued work on segment routing with good progress on services.
- rift Multicast: Revival of multicast work is anticipated due to increasing interest in multicast for AI data centers. BEER integration with rift was also suggested.
- Multicast for MOE: Discussion focused on using multicast for Mixture of Experts (MOE) architectures in large language models (LLMs). The motivation is to reduce pressure on source GPUs and reduce congestion between nodes.
- MOE Multicast Challenges: The discussion acknowledged the challenges including inconsistent sending methods (memory reading, DMA), coordination between large model coding and network layer, packet encapsulation and reliability issues.
- Dragonfly Routing: A proposal for dragonfly routing was presented, aiming for dynamic routing without silicon changes or complex VPN policies. The proposal involves a split horizon approach based on topological ordering.
- Reliability of Multicast: Reliability was identified as a major obstacle to deploying multicast in AI networks, necessitating consideration of mechanisms for handling packet loss and retransmission.
Decisions and Action Items
- KV Store: Chairs to initiate IPR review for the KV store document. Authors to address document edits after IETF 123.
- MOE Multicast: Sandy to share the recording link of the MOE side meeting to the mailing list.
- Dragonfly Routing: Tony to update the dragonfly routing draft.
Next Steps
- Continue working on segment routing and auto-assess work.
- Further investigate the use of BEER in rift for multicast scenarios, specifically considering reliability aspects.
- Explore the potential of dragonfly routing within the rift framework, taking into account power constraints and reconfigurable optical switches.