RTGWG

Summary

The RTGWG session commenced with standard IETF administrative remarks and a working group status update, including the recent publication of Topology Independent TILFA. The bulk of the session was dedicated to presentations and discussions on emerging networking challenges driven by Artificial Intelligence (AI) and Machine Learning (ML) workloads. Key themes included:

Multicast for LLM Inference: Addressing the highly dynamic and low-latency multicast requirements of Mixture of Experts (MoE) architectures in Large Language Models.
Distributed AI Inference Networks: Defining requirements and use cases for networks supporting billions of concurrent AI inferences with low latency and high security.
Agent Networking: Exploring the networking implications of agent-to-agent collaboration, including architecture and use cases for making networks smarter and more efficient.
Proactive Network Resource Scheduling: Advocating for a holistic approach to network resource allocation for ML clusters to avoid congestion, complementing reactive solutions.
Network Notifications (Fentel follow-up): A significant portion of the session focused on clarifying the problem statement and exploring various approaches for fast, fine-grained, and lightweight network notifications, particularly for AI/ML and RDMA traffic in WAN and data center environments. This included proposals for router info advertisement, fast congestion notification with proxies, SRv6-based congestion control, and credit-based flow control.
Zero-Touch Routing: A heads-up presentation on a scalable, zero-touch routing approach for highly resilient control plane connectivity.

Consensus on specific solutions for network notifications is still under discussion, with a strong call from the AD and community to narrow down the problem scope for tangible progress.

Key Discussion Points

Working Group Status:
- Topology Independent TILFA (RFC 9593) has been published.
- No new drafts adopted since IETF 1123.
- Several drafts are nearing Working Group Last Call, including BGP, RRP update, BFD, Multi-IS-D1, and QS model. SRv6 protection and National Routing Revives also require attention.
- Sasha raised concerns regarding the scalability of the SRv6 Address Protection draft, specifically concerning the substantial amount of IGP information implied. Authors were urged to address mailing list comments.
Multicast in LLM (Presenter: Sam Dizan, ZTE):
- Problem: Large Language Models (LLMs) using Mixture of Experts (MoE) architectures require distributing "tokens" to a dynamically selected subset of "experts" (GPUs). This is a highly dynamic multicast use case with extremely short (microseconds) decision times.
- Proposed Solution: Bit Index Explicit Replication (BIER) was suggested as the best choice to adapt to the high dynamic characteristics.
- Discussion: Related work is ongoing in BIER and PIM working groups on how to distribute traffic when destinations are known only microseconds before transmission.
Distributed Inference Network (Presenter: Song Jin, China Mobile):
- Motivation: AI is rapidly changing internet usage patterns, shifting to AI module access, multi-module interactions, and distributed AI inference. This brings new traffic patterns and requirements.
- Requirements: Support billions of concurrent inference sessions, provide low latency, minimal jitter, extremely low packet loss, ensure high security (data/module isolation), enable network recognition of AI inference sessions (fine-grain workload identification, application-aware steering), and provide end-to-end telemetry (packet, token latency/throughput, inference efficiency).
- Use Cases: Enterprise security, edge-cloud collaboration, dynamic module selection, adaptive resource scheduling, privacy-preserving split inference.
Agent Networking Use Cases, Requirements, Architecture (Presenter: Jun Wang, Huawei):
- Background: Agent networking, where agents collaborate autonomously, is emerging (e.g., in enterprise, smart homes, industrial IoT). This relies on Agent Connection Protocols (ACPs).
- Architecture: Comprises an agent connection layer and a physical network layer. The network provides basic connectivity and can be made smarter by agents.
- Use Cases:
  - Bank union agent networking: Agents replace traditional API calls, enabling natural language interaction.
  - Network device agent networking: Fault diagnosis in data centers, campus network segmentation verification.
  - Controller-network device agent networking: Network performance monitoring and troubleshooting automation.
- Discussion:
  - Questions arose regarding the appropriate IETF area for this work (operations vs. routing). It was noted that agent gateways might involve routing mechanisms, while device/controller agents might be more operations-focused.
  - Suggestions included examining the agent gateway protocol for augmentation and considering a potential new working group for agent-to-agent communications.
  - A call was made for more specific proposals on what the IETF, particularly the routing area, should do.
Scheduling Network Resources for ML Clusters (Presenter: Pawan Biram):
- Motivation: Network performance significantly impacts ML compute. The draft advocates for proactive scheduling of network resources alongside compute resources to optimize job completion times.
- Proposal: Identify endpoints, determine network resource requirements, and reserve resources upfront. This approach aims to strategically avoid congestion, complementing reactive tactical solutions.
- Benefits: Reduced congestion, job isolation, and fast recovery using backup paths (mimicking TE principles like global repair). Applicable to DC, DCI, and Metro.
- Discussion:
  - The time requirements for scheduling depend on the job type, with a need for further experiments and numbers.
  - Reference was made to existing SRv6 Ops drafts for AI fabric that achieve similar goals without changes to SRv6.
  - Concerns were raised that traditional traffic engineering might be less impactful in homogeneous data center fabrics, suggesting focus on DCI and optical switches where dynamic topology changes offer more benefits.
  - The interaction between network reservations and congestion control mechanisms needs careful consideration.
  - The relationship to Network Resource Partitioning (NRP) in the TIS WG was noted.
Fentel (Faster Notification for Traffic Engineering and Load Balancing) Discussions: This block covered the "Problem Statement" and several solution drafts.
- Background: Following a Fentel BoF at IETF 123, the AD requested further clarification of the problem scope within RTGWG.
- Network Notifications Problem Statement (Presenter: Jay Dong, Huawei):
  - Use Cases: AI/ML traffic within/across data centers, CDN/DC interconnection, cloud-edge continuum, all requiring real-time network status.
  - Problems with Existing Approaches: Slow control plane signaling, insufficient (binary) information, suboptimal local decisions, excessive overhead from frequent updates.
  - Requirements: Real-time, fine-grain, lightweight notifications delivered to targeted nodes for optimized decisions/actions.
  - Discussion: What information to carry (event type, location, quantifiable metrics, affected paths/flows), who are the recipients (nodes, end hosts, controllers, applications), and delivery modes (unicast, multicast, hop-by-hop, flooding, subscription).
  - Community Feedback: Strong advice to narrow the problem scope, focusing on "fast notifications" in the data plane for local, nearby routers, particularly in WAN environments rather than within DC fabrics. The importance of understanding the "action" derived from notifications was emphasized.
- Router Info Advertisement (Presenter: Jeffrey Zhang):
  - Proposal: A UDP-based mechanism to advertise link utilization and other router information to neighbors for global load balancing (GLB) in AI data centers. It uses TLV encoding and supports fast/slow advertisements without re-flooding.
  - Discussion: Concerns were raised that the initial proposal was specific to a particular vendor's silicon, with a strong call from the WG chair for further generalization and input from various silicon vendors to ensure a standardized, interoperable solution.
  - The applicability of GLB within homogeneous data center clusters was questioned, with suggestions to focus on DCI/WAN scenarios.
- Fast CNP with Proxy (Presenter: Yin Zhang):
  - Motivation: Traditional ECN/CNP is too slow for long-distance AI/ML traffic in WAN due to latency. Need direct notification from congested node.
  - Proposal: Introduce a proxy node (e.g., leaf switch) to overcome limitations (congested node not knowing sender's CNP support or RoCEv2 mapping, or sender being outside IP domain).
  - Mechanism: Proxy selected and advertises mapping. Congested node finds proxy, sends a specific notification (CN1). Proxy sends CN2 to the traffic sender. Involves IGP/BGP extensions.
  - Discussion: Questions were raised about adding latency with a proxy to an already latency-sensitive problem.
  - Credit-based flow control base on SRv6 Paths (Presenter: Yasu Lu, China Mobile):
    - Challenges: Imprecise upstream tracing with traditional PFC in WAN, long latency for end-to-end SRv6 paths, and control overhead at SRv6 headnodes.
    - Proposal: Congested P-nodes send notifications (with SID list, Slice ID, Q info) to upstream P-nodes for local traffic control. If insufficient, the signal propagates to the headnode for path rebalancing.
- Fentel for RDMA Transmission in WAN (Presenter: Jia Ruan, China Telecom):
  - Motivation: AI services require lossless RDMA transmission in WAN for distributed training/inference.
  - Solutions:
    1. Fast Notification: P-nodes detect link/node failures, congestion, or bandwidth changes, send notifications to ingress PEs for rerouting/load balancing. Uses IP-over-SRv6 or UDP-based notifications.
    2. Credit-Based Flow Control: RSVP-based mechanism where devices exchange credit values (buffer state) to adjust transmission rates.
  - Discussion: Warnings were given regarding ICMP rate limiting in host stacks/forwarding planes for notifications and the high complexity of per-flow credit-based flow control with RSVP.
Kira: A Scalable Zero-Touch Routing Approach (Presenter: Roland):
- Motivation: Frequent control plane connectivity failures (e.g., major outages at Facebook, Google).
- Objective: Provide highly resilient, autonomous, zero-touch control plane connectivity, independent of specific data plane IGPs.
- Features: Scalable to thousands of nodes, works in diverse topologies, supports mobility, uses plain IPv6 forwarding.
- Status: Internet-draft exists, running code for a native routing daemon in Rust using NF tables. A site meeting was announced for further discussion.

Decisions and Action Items

The working group chairs requested active participation in BGP and Couter work to help move drafts to WG Last Call.
For Fentel-related work: The responsible AD and community strongly urged presenters to narrow the problem scope, focusing specifically on "fast notifications" within the data plane, particularly for WAN and DCI use cases where the benefits are most clear.
Presenters for all drafts were encouraged to incorporate feedback received during the session and continue discussions on the mailing lists.
The RTGWG will continue discussion on the problem space of network notifications, especially regarding whether to focus on general vs. faster notifications, data plane vs. control/management plane, and coordination with other layers.
The Router Info Advertisement draft was asked to be further generalized beyond vendor-specific implementations, with a call for input from different silicon vendors.

Next Steps

Continue refining draft proposals based on discussions and mailing list feedback.
Specifically for network notifications, the community is encouraged to contribute to narrowing down the problem scope and providing concrete examples of practical use and actions to be taken based on notifications.
Further discussions are expected for all presented drafts on the RTGWG mailing list.
A site meeting for the Kira draft is scheduled for IETF attendees interested in zero-touch routing.
The next IETF meeting will be held in Shenzhen.