Session Date/Time: 19 Mar 2026 01:00
This is a transcript of the Routing Working Group (RTGWG) meeting held during IETF 125 in Shenzhen.
Jeff Tantsura: Now, agenda. Now we can start. Good morning, everyone. Welcome to Shenzhen IETF 125 Routing Working Group meeting. So, let's start with chairs' update. We are fully automated, yes.
Jeff Tantsura: Note well. Please read "Note Well." Please make sure you're familiar with all the BCPs and IETF guidelines with regard to conducting yourself, and please don't harass others, be nice to each other.
Meeting tips: if you participate in person, please sign into the session and raise your hand if you are trying to get in and talk. For remote, please make sure your audio and video is off. Bandwidth is limited, life is short, so only do it when you are presenting.
So, minutes are collaborative. Please join via Meetecho for questions and please check the notes and minutes before we publish them so we know we are publishing the right thing. And when you are speaking, please state your name and affiliation.
We've got as always a long and full agenda, so please be on time, don't eat into other people's time. We don't have any new RFCs since 124.
So, we have adopted the draft-ietf-rtgwg-net-notif-ps problem statement and we are looking forward how Fun-Tell or FAN, with the new name, is going to develop. Eventually we might decide to spin out actually new working group. There is still ongoing discussion on the charter. There is more and more kind of surrounding documents we've published, information model and data model, and potentially we expect before 126 to start the process of a new working group.
draft-ietf-rtgwg-multisegment-sdwan has been submitted for publication. Thanks authors for great work here, a lot of wording.
draft-ietf-rtgwg-srv6-egress-protection is working group last call. Actually, working group last call is formally done given the dates. The authors have responded to all the questions, so I'll do a review during the weekend and continue with further progress of the draft.
draft-ietf-rtgwg-bgp-pic, there are some formal changes. Yingzhen is joining the draft as co-author and I'll be doing the shepherding. And please review all the new changes. I think Yingzhen did really great work by making it more readable and fixing all the stuff that has- should have been fixed years ago, but yeah.
So, there was a meeting with the draft-ietf-rtgwg-vrrp-bfd-p2p authors, so both drafts will be updated. We'll update the working group on the decision taken, but we are progressing with it.
The draft-ietf-rtgwg-net2cloud-problem-statement has been returned to working group again for- I'm not sure how many times it came back. We need consensus to progress it or maybe not progress it.
On QoS model, there have been comments from directorate review. It's on agenda today.
draft-ietf-rtgwg-srv6-egress-protection, I think we reached consensus with the LSR, so it will go in joint last call with LSR.
And draft-ietf-rtgwg-dst-src-routing-revive, there are some comments that need to be addressed.
And if no questions, we'll start with the first presentation. Can we get one or two volunteers to help with the minutes? Anybody? Come on. Yeah, okay. Your virtual assistant. That one always works, no complaint, right? And as a side comment, please send your slides before the deadline. We have a deadline for a reason. We will just not allow you to present if you don't do this. So, for example, I received an slides update request this morning, so I cannot guarantee that I will always see that sort of request. So, if you don't submit your slides before the deadline we specified, we cannot guarantee your presentation. Period. And Yingzhen being nice, he cannot guarantee, I can guarantee you won't.
So, QoS model.
Aseem Choudhary: Yeah, I will be online. Thanks for sharing the slides. Okay, so hello everyone. My name is Aseem Choudhary. I will be providing the updates for QoS YANG model. Let me control the slides. Thanks.
Okay, since the last working group call, we have got multiple comments through multiple reviews and most of the comments have been regarding the refined language refinement and improvement of descriptions. We have taken care of almost all the comments. We have also got some comments regarding the data types which are used in filters and actions. And based on these comments, we are able to update the draft to version 15, which has been published now.
So, I will try to cover two important comments. First comment was regarding the DSCP range, which as per the comment, DSCP range may violate RFC 2474. So, there was a discussion on the working group mailing chain, mailing list regarding that. And based on the resolution which we achieved in the working group mailing list, we have modified the text in the document to state explicitly that the domain gateway routers has to abide by, has to respect RFC 8436. And for internal domain, internal routers, may classify traffic based on the DSCP ranges.
Next comment I actually wants to highlight was regarding the PHP selection on the nodes. And it was mentioned in the draft that there is no clarification or description, detailed description about PHP selection where the PHP selected and who takes what kind of actions. And based on the comment, we have modified the draft to explicitly state that PHP is selected at edge node as well as at the core of the diffserv domain. There is a BA classifier which maps a DSCP traffic to a particular PHP per node. And DSCP to PHP mapping is local to the node. And indeed, multiple DSCP value may map to the same PHP at the single node. And obviously, same DSCP may map to a different PHP when it's a different node in the network.
There has been some comments about data model also. And essentially what we have done, we have taken care of those comments and to list out, we had defined source and destination IPv4 and IPv6, which were defined as a list, but there was a single element in the list. So, we have defined it as a leaf-list. In addition to that, we had a protocol type, protocol in the filter type, which was defined as uint8. Based on the RFC 9911, we have defined as protocol-number as defined in the IETF inet-types in the RFC 9911. In addition to that, we had the reference of QoS policy reference, of meta-reference, as well as the queue-reference, which was defined as strings even though we had defined the templates for those in the same draft. So, now we have changed the type from string to leafref to the corresponding template for the, you know, corresponding reference constructs.
So, as a next step, we'll essentially double click on the working group last call review comments. We certainly want to address any particular remaining comments which probably we may have left out. And I also suggest and request people from this particular community to review the draft again and pass any further comments. We certainly plan to work on those and will take care of those comments in the next version of the draft. Questions?
Yingzhen Qu: I think the comments you have been addressing are direct-rate review comments. The document hasn't gone through working group last call yet.
Aseem Choudhary: Okay, I see. Okay, yeah. I mean, we asked for last call, but yes. Yeah, it still has to make.
Jeff Tantsura: So, mostly good updates. Draft is now more concise. Please bring it to the finish line. We would really like to start working group last call. It's about time.
Aseem Choudhary: Okay, sure. Thank you.
Jeff Tantsura: Thank you.
Slide: 03-Fast Network Notifications Problem Statement
Jie Dong: Good morning everyone. This is Jie Dong from Huawei. I'm going to give a update on this Fast Network Notifications Problem Statement on behalf of all the authors and the contributors.
So, this is a quick recap of the background. For people who are not familiar with this information, we actually based on the motivation from the AI or Machine Learning training service, but also considering the requirements from the other cloud-based services. The network need to be more adaptive and can be react faster to the to make ensure the service can be reliable and congestion-free transferred within or among data centers. And we think what we need is a good and timely understanding of the network operational status, including the congestion situation or the failures, to help improve the network utilization, to reduce the service latency, and enable faster response to other critical events. So, this document describes the problem statement of fast network notifications, and it is used as a supporting document for the incubation of the new working group on the fast network notification, or called it FAN.
So, here are the updates since last presentation in IETF 124. We have received many comments and good suggestions and incorporated them into the current version. We would like to thank to people who have reviewed the draft and contributed to the text. First way here on this page we list the changes before adoption call. We have a new co-author, Rashad, and the scope has been narrowed down to fast network notification according to the feedback on the IETF 124 meeting. The structure of the document has also been reorganized to improve the readability. Thanks to Adrian for the help. In the introduction section and also the section 3 have been enhanced with the text about why fast notification is needed and what fast notification means. The description of the existing mechanism has also been revised to be more accurate, like the description related to BFD and ECN. We also add one subsection about the possible actions. It also clarifies that the action mechanism themselves are out of the scope of this work. And the security considerations has also been enhanced.
And we also received a lot of good comments during the adoption call and the draft has been revised to address these comments. First, we have revised abstraction section to address the comments on lossless, I think now it is more accurate. And we will further clarify that the scope is limited to fast network notification in explicit in the draft. And we also mentioned that fast notification applies to a range of network scenarios and topologies, while the solutions for different cases may differ. We also have some text- we also reduced the text in the recipients subsection to focus on the receivers as entities. The delivery mechanism for the fast network notification have been expanded to include the subscription-based approaches. There are also many editorial changes to polish the text. And since it's a working group document now, a GitHub repository was created for collaborations. Here is the link to the GitHub repository. For now, all the comments or issues have been addressed and the issues have been closed. We welcome your review and further comments, either in the mailing list or to the GitHub.
Here I would like to also mention the progress of the Fun-Tell or FAN side meeting during this week. The goal was to polish the charter text in a face-to-face meeting, which is more efficient. And we really did that. So, we got very good constructive discussions and comments during the meeting. Thanks to all the participants. And we also have one agreement on the term or name used for this potential work working group, which is the FAN. The we also updated the charter text based on the comments received during the side meeting or before the side meeting on the GitHub. And for now, all the issues have been closed. And here is the GitHub repository link. Please review and continue the discussion on the Fun-Tell list. You can also raise issues to the GitHub.
For the next steps, authors of the problem statement draft will update the draft to reflect the comments received during the side meeting because some of the scope or some of the description in the side meeting can be used in the problem statement draft. And we welcome further review and comments to this draft. Either you can send comments to the list or raise issues on the GitHub so that we can reach a relatively state-steady state as soon as possible to guide the future work. Thank you.
Jeff Tantsura: Any questions? Jeff, go ahead.
Jeff: Hi Jie, this is Jeff. Very good comment. The security mechanisms are very likely going to introduce no latency into the feedback mechanism. And since the security mechanisms almost inevitably slow things down, one of the things that draft should talk about is what the latency requirements are for the feedback mechanisms for this to actually accomplish the desired goal.
Jie Dong: Yeah, very good question. I think during the side meeting, people also mentioned this, how much we want to take the security into consideration in the beginning of the work. You really want to balance this between the security and also the effectiveness of the notification. Yeah, this is something we may consider to revise in the text to reflect this.
Jeff Tantsura: So, I think we'll happily reuse definition of limited domain that SRv6 brought and, you know, as long as you're there, no security. Pretty much, yeah. Thank you.
Yingzhen Qu: There was some discussions during for the charter when we were talking about the the domains that yeah technology apply. So, the charter will be updated to reflect that.
Jie Dong: Yeah, we already have some text about the the domain is under the single administrative control. So we may start from that. Okay.
Jeff Tantsura: So again, there was some formalization of term "limited domain" during SRv6 work. I think we can reuse it.
Jie Dong: Yeah, yeah, sure.
Jeff Tantsura: And to add to that, Yingzhen and myself, we started working on data model, so probably sometime next week we'll publish first revision of data model to kind of formalize all the discussions. Good. Any other questions, comments? No. Thank you, Jie.
Jie Dong: Thank you.
Slide: 04-IP Fast Reroute for AI/ML Fabrics
Roy Yang: Thank you chairs. Ladies and gentlemen, I'm Roy Yang from Alibaba Cloud. Today, I would like to present our draft on behalf of the other co-authors. And this draft is about the IP Fast Reroute for the AI and machine learning fabrics. And this is the first time we present this draft in IETF.
So, firstly, there is a reminder of the AI backend networks. So, the scale-up network, which is the network between the the GPUs, is out of scope of this discussion. Let's focus on the scale-out and scale-across network. The scale-out network typically is a network within the DC, the data center. It have a two or three-tiered body-clos architecture typically. Sometimes it leverage the multi-plane and multi-rail design. And from the perspective of the routing protocol, typically it's run the EBGP. So, all of the nodes talk to each other with EBGP. There's no IGP routing protocol. And for the resiliency mechanisms, it's heavily rely on the ECMP to do the fastest switch over. And the scale-across networks is another story. Typically, a scale-across network is a wide-area network, right? So, the topology would be more arbitrary and the routing protocol typically we use the IGP and the BGP. Okay. And as to the resiliency mechanism, besides the ECMP, we will use the LFA, the loop-free alternative, or TILFA, which is a mechanism rely on the the segment routing to achieve the topology-independent LFA.
And we found that there are some limitations of the existing mechanisms. The first one is the current mechanism is very slow because it's depend on the CPU to activate the resiliency mechanism, right? A CPU need to be involved to react to the the failure, which may cost around maybe hundreds of millisecond. But for the for the AI or machine learning workload, it may need hundred microsecond level fast convergence time. So, it's very slow if the CPU is involved. And the second one is the all of these current protection mechanism are only handle the local failure, which means only the node suffers the failure can take action, okay, to do the ECMP fast switch over or to enforce the TILFA path. But the ECMP is not always available in the network. For example, in a three-tiered clos architecture, when the packet come from the ingress leaf to the spine, there are ECMPs. From the spine to super-spine, there are ECMPs. But if you go from the super-spine to the next spine, and from the next spine to the next leaf, there are no ECMPs, okay. That's the case of the the DC network. And in the in the in the WAN or in the scale-across network, if you use the TILFA, it will cause the hair-pinning, right? The hair-pinning of the traffic, which is not optimal. So, that's the second limitation. The third one is all of these fast reroute mechanism cannot handle the case of capacity reduction or the quality quality degradation. It work on a binary way, right? It's only between yes or no. But we hope the network can handle the case in a more smart way. It can aware of the the different case. For example, if the the capacity is reduced by some partially link failure in a bundle member link, for example, a bundle member link failure but not instead of a whole bundle failure, right?
So, based on this analysis, we propose a framework with four enhancement to to to improve the performance of the fast convergence in a fabric for the machine learning and the AI. The first one is the hardware accelerated protection activation. The third one- the second one is hardware accelerated network notification. The third one is a mechanism to achieve the complete topology visibility in a BGP-based network. The last one is a quality-aware and remote protection. We will go through those four points one by one.
Hardware accelerated protection activation, this is fundamentally is a matter of local implementation, right? It's leverage the extended NPU capabilities to activate the protection without the CPU involved. So, it will be very fast. And we want to say we hope this hardware accelerated activation could handle both the local failure and the remote failure. In the later case, we rely on the second mechanism, which is a hardware-based network notification spread or exchanged within the network. So, yeah, that's the the hardware-based network notification. So, we have seen extensively discussion in the FAN-Tell context. I will not repeat those discussion. And there are two very helpful or useful draft for the statement problem statement and some, you know, standardization effort in in the IETF.
And the third one is a complete topology visibility. As you know, in a IGP-based network, the link state routing protocol can exchange the link state information, right? Flooding those information across all the whole network. And every node have the visibility of the topology so that it can, you know, make the decision based on the topology to react to the the failure, for example, to build the loop-free alternative path. But in a typical DC network for the AI, for the for the machine learning, we use BGP. BGP is a distance vector routing protocol. It do not exchange the link state information. So, every node running BGP only aware of the topology around itself, not go beyond the next- not go beyond its its neighbor. So, to resolve this problem, we propose to to use the BGP protocol to exchange the link state protocol- information, so that all of the nodes can do the TILFA.
Yingzhen Qu: Roy, you have 30 seconds.
Roy Yang: Okay. The last one is the quality-aware remote protection. And that idea is to let the node to advertise, to notify the failure with different type, more detailed information, so that the other node can react to the the the the the failure. Yeah, I think that's basically all of I have for this session and we are looking forward any feedback from the community. Yeah. Thank you.
Jeff Tantsura: Great.
Francois Clad: Yes, thank you. Um, so um, am I- I heard you right that you said that you need to achieve 100 microsecond switch over time?
Roy Yang: Yeah.
Francois Clad: Okay. So what defect detection or failure detection you envision to use?
Roy Yang: Yeah, we expect to to rely on the hardware-based failure detection, for example, you the the the NPU to detect or the driver to detect any failure of the link in a very fast way.
Francois Clad: Uh, well, for example, BFD is implemented in with a hardware assistance. Would that be something that you believe will be suitable?
Roy Yang: No, because BFD is hardware-based to to send or receive the the probe, but it still involves the CPU to handle the situation if the keepalive packet lost, right? So, you still rely on the CPU and CPU is the is the source of the slow procession. So, we want to totally bypass the CPU, to make the NPU itself to react to the to the failure.
Francois Clad: But how you can guarantee detection of network link failure if there is nothing being transmitted on the link? So...
Roy Yang: Okay, that's, yeah, that's that's another topic. I think given the limit- limited time, we can discuss it in the in the alias.
Jeff Tantsura: Let's take it to the list.
Francois Clad: Okay.
Jeff Tantsura: I think it's local implementation detail and, it's a very good topic, right? I think if you look at-
Francois Clad: I think it's interoperability issue, but yeah, okay, let's take it to the list.
Roy Yang: Yeah, I agree with you. Yeah.
Jeff Tantsura: So, in fact, we can use- from China Mobile. And I think it is a very good presentation. About one years ago, we presented another solution, very similar to your solution. Maybe we can have some talk and offline. Yeah, thanks.
Roy Yang: Sure. Thank you. Thank you.
Jeff Tantsura: As a working group chair, kind of it's a great pleasure to see all these technologies we've been working for 20 years coming back home. It's full cycle, right? Started IP FRR work 15-20 years ago, still coming back. So, I really welcome this work. It's interesting. Please clearly separate data center network from WAN network. If you are looking at what you call scale across, or actually we call scale across, it's nothing else than regular WAN network. All the technologies we've been developing over the years from RSVP-TE fast reroute to TI-LFA are fully available. In regular topology, situation is different. You actually know the topology perfectly. It's regular kind of- so focus on complete separation in problem statement and defining solution for both.
Roy Yang: Sorry, I didn't- decouple data center network from WAN network. Point by point. Okay, sure. Yeah, makes sense. Yeah. We will do that. Yeah. Thank you. Thank you.
Slide: 05-Efficient Remote Protection
Francois Clad: Okay, let's go to the next Francois. Okay. Hey, good morning. I hope you hear me well. I'm Francois Clad from Cisco, and I'm going to present the efficient remote protection draft on behalf of the co-authors. So, that is a- is a fairly new draft. It was- it was submitted a few weeks ago. So it's the first presentation we're doing on it and we welcome any feedback from the working group.
Okay, so that follows the previous presentation and in particular this efficient remote protection or the remote protection that is also quality aware. We'll see- we'll see that more in details through the slides. So I wanted to highlight first two limitations that we have with current IP fast reroute technologies, in particular in AI data center because of all these new requirements that that the AI workload imposes.
The first limitation is that this is local only protection. So it's only the node that is local to the failure that is detecting the failure that takes action to protect the traffic. Of course, the IGP or the routing protocol (IGP or BGP) will eventually reroute, but the point of fast reroute is to be faster than the IGP and to take care of delivering traffic in the meantime. So here, and that local protection, it can be ECMP or it can be LFA. And that's great when we have that, but we don't always have an ECMP or LFA available. And then if we don't have ECMP or LFA, then we can use other mechanism like remote LFA or TI-LFA, but those can create hair-pin kind of issues like we see in that example.
So we have a leaf L1 that is sending traffic to L2 via a spine S2. And there is a failure from the spine to the- to the leaf, to the destination leaf. And the spine S2 is enabling its TI-LFA path. It's sending the traffic on the post-convergence path towards the destination that is via L1 then S1 and L2. That's normal, that's the post-convergence path from S2 perspective. But if we look at it from the perspective of L1, it's not very efficient because we're sending the traffic from L1 to S2 then back to L1, to S1 and finally to the destination. So that's what we are calling a hair-pin and the kind of thing that we want to avoid, in particular in those AI data center where latency and bandwidth are so critical.
Another limitation that we have with the current fast reroute, as Roy mentioned earlier, is that there is no load or quality awareness. So the fast reroute, it reacts to complete failure, but if there is a link degradation or partial failure, then it doesn't- it doesn't react at all. So let's take the example. It's similar to what we had before, but here we are using link bundles, 2x10 gig, so that gives us a 20 gig link bundle. And we have a partial failure on one of the member links from S2 to L2. So instead of the 20 gigs, we only have 10 gigs available. But the leaf L1 doesn't know about that, so it keeps sending 20 gigs and doing its regular ECMP between S1 and S2. But at S2, we only have 10 gigs available out of the 20 that L1 is sending, and that creates a congestion.
Okay, so in order to solve that issue, we are proposing this new mechanism that we call efficient remote protection. So let's start with the failure case, which is maybe easier. So here on the left-hand side, we had the example we used before with the hair-pin problem. But if we had a mechanism for S2 to notify L1 that there is a failure from S2 to L2, then L1 could actually act on it and could have a backup path protection for that remote failure. And that protection would send the traffic to the spine S1 directly and then to L2, so that the traffic won't have to go to S2 and then back to L1. Of course, as I said, we need a notification of that failure. So that leads back to the work on FAN or Fan-Tell on fast failure notification. This would be a requirement for those kind of solutions.
Before I go to the quality thing and the partial failure, I wanted to highlight something about this failure notification. We had some discussion on Monday during the side meeting about who should receive the failure or the network notification. Should it be limited to one hop? Should it go further than that? So here I wanted to highlight one interesting case that in a three-tier folded clos network, like we can see in some AI data center, if we have a failure between the destination, well, the spine before the destination leaf and that destination leaf, then actually we may need to go three hops away to find a remote protection, an efficient remote protection. The problem we have here is that from the local node that detects the failure (SD1), we don't really have any efficient protection available. We could send the traffic back, but that creates a hair-pin. Or we could send it to another leaf and then back to another spine and finally to the destination leaf, but that's not really efficient either. So there is no efficient local protection. If we go back one hop to the super spine level, then here we really don't- there is really nothing we can do because the super spine doesn't have any ECMP. It's only connected to one spine in each of the- in the destination pod, so it can't really do anything. Then let's go back one more hop to the source spine or the spine in the source pod. And here- here we are connected to two different super spines, but these two different super spines are themselves connected to the same spine in the destination pod most of the time. So again, it doesn't help. And then in the end, in the end, the only way to efficiently protect against that failure is to go back all the way to the source leaf, the LA1 in that example. That means three hops away. So that failure notification will need to travel three hops.
Okay, so then a note on the quality awareness. So if we go back to the example we had before about the quality, so here we have the same partial failure, the failure of a member link of that bundle between S2 and L2 that reduces the capacity to 10 gig. If we send a degraded quality notification, another type of network notification from S2 to L1, then L1 can act on it and can adjust its load balancing so that it no longer sends 20 gigs to S2, but it only sends 10 gig.
And then finally, I've been talking a lot about ECMP fast reroute and data center topology, but I just wanted to add that the kind of solution we describe in the draft are in fact applicable to any topology. Now, I understand just that the requirements in the AI data center and the requirements in the WAN are totally different. I'm not saying that this is a one-size-fits-all, not at all. I'm just saying that here we have a generic solution, but of course the primary focus for that is the AI data center.
And I think I am out of time, so that's good. I reached the final slide. So as I said before, this is the first presentation. It's a new draft. So we welcome any feedback, any yeah, any comment, any contribution from the working group on that work. Thank you.
Jeff Tantsura: from China Mobile. So, first of all, thank you very much for your presentation. I think we catch the same problem and provide the similar solutions. I sent a message through the chat. Please see the draft I mentioned. Maybe we can talk more offline. Thank you.
Francois Clad: Yeah, of course. Thank you very much.
Jeff Tantsura: working group member. Couple of observation here and actually addressed to all drafts previously presented. Please don't forget there's an extremely robust load balancing framework implemented on the host. There- how does host know where to send traffic? It runs congestion control that as of today is RTT based pretty much everywhere. So hair-pinning is absolutely detrimental to collective performance. If part of your QP goes over longer path, entire collective will suffer really badly. So we really don't want to do bypasses in this network. Not to forget it's really a equidistant leaf spine network, so you don't need bypass. Number two, usually we have hardware probes implemented from the host side that are very fast. So normal period would be kind of one probe per RTT, so think about 5 to 10 microseconds. We must make sure the in-network fast reroute doesn't interact in weird ways with on-host fast reroute. Some of us remember ATM over IP fast reroute and what happened when they start to interact with each other. Very similar situation. Make sure the timing is right not to try to interact with host-based notification because this is how host changes planes and potentially use different GPUs. So this is not really kind of open knowledge if you run AI network you know this otherwise not. So I'll try to guide you to the degree I can share with you but practically keep in mind there's very robust host and transport development machinery on the host and network is just in between. It must not intervene in any unhealthy way.
Francois Clad: Yeah, I think, very quick question. So in the multi-hop scenario, how do you know, how does the sending node know which are the receivers of the notification? So in the multi-hop scenario, of course you want to multicast, right? So yeah.
Jeff Tantsura: So here, that's- that's a very good question. Thank you. I think part of the- part of it will have to be answered by whatever solution we come up with for the network notification. But in my opinion, we would need some kind of a subscription mechanism, a way for the nodes that are interested in those notification to make themselves known to the sender so that they can receive them. But again, that's just my opinion on it and it's a topic that is open for discussion.
Jeff Tantsura: Thank you. Just as working group chair now, I believe the answer would come from Fun-Tell or FAN, whether the encapsulation of fast notification is layer 2, so one hop per definition, whether it's IP/UDP which could potentially be multi-hop or something in between, right? So we would expect this to come from Fun-Tell work. Thank you. So we'll go to the next presentation. So just a reminder, we do have a very tight schedule, so all the presenters, please try to stay in your allocated time slot, including Q&A, if you are expecting questions. Thank you.
Roy Yang: Okay. I will give a brief update of the fare in sun draft. Changes since the 01 version: the addition of five co-authors, most of them are from GPU vendors. The addition of some considerations on memory semantic operations. Here I will clarify the motivation for this draft. As you know, Ethernet-based scale-up network has becoming a dominant trend. Here I list some credit examples. In Ethernet scale-up networks, there are multiple planes with GPU being multi-homed to each plane. For the traffic used in the scale-up networks, it has characteristics such as low entropy, elephant flows, and burstiness. So, the traditional static ECMP is not applicable in such scenario. Adaptive routing have been widely recognized as a mechanism to improve the load balancing in multi-plane networks. FARE using BGP is a standard-based adaptive routing mechanism, which was originally designed for scale-out networks, but it's straightforward to extend this protocol to the scale-up networks. That means GPU could interchange BGP routing with top switches and then perform adaptive routing.
Here I give a concrete example about one-tier scale-up network. Here each GPU is multi-homed, equipped with multiple high-speed ports, and at least one sub-port is connected to a given network plane. The reason for using multiple links between a given GPU part and per plane including it's easy to scale up the- scale up the size of the superpod. For example, if you need add more GPUs with a single layer of network, you can use less links between GPU and per plane- and the switches within a given plane. Also you can add more layer, for example, you add another top layer switches. So that's the reason why multiple links is popular to be used in this scenario.
In this example, if there's no link down event happened, then when performing load balancing by source GPU, it will send the same weight values for each ECMP route where each network plane. In case one sub-link of a given plane is broken, the ECMP route where that broken- that error plane will be reduced by 50%. If the whole plane is broken, then the ECMP route where that plane the weight will be set to zero. The logic is very simple. It's bandwidth capacity where WCMP and you can perform per-flow based WCMP or per-packet based ECMP. For per-flow based WCMP, it's useful for ordered packet delivery mode and at least one RDMA QP need to be established between a given BGP peers for at least one sub-port. Of course, switches in this scenario need to perform per-flow based load balancing. For per-packet WCMP, a single QP is good enough for a given pair of GPU. And in this case, both GPU and the switches could perform per-packet based WCMP, also called packet spraying.
When implementing memory semantic operation, we could support three mode, such as weak ordering, strong ordering, and passive ordering. Since we provide flexible selection between per-packet and per-flow based load balancing, it's easy to achieve the three mode requirements. Currently, a POC of 64 GPU superpod is almost completed, while the 1K GPU superpod is on the way. Any comment or suggestions?
Jeff Tantsura: NVIDIA. Two question: one, most of the scale-up framework mentioned envision non-IP encapsulation. Wouldn't that mean that we would need something like EVPN to provide non-IP identifiers?
Roy Yang: You mean there are no IP traffic in-
Jeff Tantsura: The endpoints are GPU identifiers, they are not IP addresses.
Roy Yang: But some GPU vendors can still support IP-based encapsulation.
Jeff Tantsura: Absolutely. But again, if you are going through list of frameworks, most of them envision compressed Ethernet-like headers. It's not really Ethernet but definitely layer 2. Don't you think we would need something that can allow transport of opaque identifiers such as EVPN, for example?
Roy Yang: Let me try to understand your topic. You mean in a single-tier network, you don't need IP-based forwarding?
Jeff Tantsura: I'm not saying whether it's needed or not. I'm just commenting on frameworks you've mentioned. Most of them envision non-IP encapsulated traffic.
Roy Yang: Okay, let's talk offline.
Jeff Tantsura: Second question: in a single tier there's no routing. In a single tier there is no routing, there is no BGP. Practically if anything you need some form of fast notification.
Roy Yang: No, we are produce- we are introducing BGP to this scenario. Of course, there are no BGP currently, but we are pushing forward.
Jeff Tantsura: What's the direction?
Roy Yang: BGP on the host.
Jeff Tantsura: Yeah, BGP. BGP running on host is common. Even NVIDIA's stack- techno- protocol. Even NVIDIA's tech stack can support running BGP on host. You can confirm with some of your colleagues. Okay.
Slide: 07-Use cases and Requirement for Flow Control Collaboration Across DCNs and WAN
Hanjunxin: Good morning everyone. This is Hanjunxin from China Unicom. Today, I'm going to share our work on the flow control collaboration across data center networks and wide area networks based on this two below drafts.
Let's start with the background. Currently, PFC and ECN are widely adopted in data centers for lossless transmission. With the rapid growth of distributed AI training and inference across geographically separated data centers, the demand for congestion-free data transfer has expanded from the DCNs to WAN. As we can see from the table, the WAN is very different from the DCN. It is in large square with a complex dynamic topology and the long RTT. The traffic has frequent microbursts. So all of these makes significant challenges to achieving congestion-free data transmission.
Why not just apply the PFC to WAN? PFC is a layer two mechanism of mechanism providing the port-level feedback for hop-by-hop backpressure over Ethernet. It works well in DCN, but when it apply to WAN, it has critical limitations including the head-of-line blocking, deadlocks, and congestion spreading. These issues will- will decrease- degrade the network throughput and utilization in WAN. So to address these limitations and based on the FAN-Tell problem discussion, we propose the fine-grained flow control (FGFC) as an enhancement of PFC designed for IP WAN. It is a fast precise layer three congestion control mechanism operates in the data plane for millisecond-level response. The FGFC enables precise flow control at entities, unlike the PFC port-level control, and it limits the flow control to specific pipes and slices and provide intelligent preemptive congestion backpressure based on the traffic prediction. We can extend the PFC via the network protocols such as ICMP, UDP, etc., to carry this kind of backpressure message.
And why we need to consider the co-deployment of PFC and FGFC? For data center interconnection over wide area network, it requires to achieve the congestion-free data transfer across data center. As the PFC is already deployed within the DCN and FGFC is new mechanism used in WAN, the edge node R1 and R5 in figure served as the boundary between these two domains need to coordinate PFC and FGFC to achieve the end-to-end flow control.
There are two types of the deployment scenarios: scenario one is the single-hop direct interconnection. The FGFC message are directly sent between the edge node R1 and R5 through the established tunnel. And the scenario two is the multi-hop interconnection where the intermediate node take part- participate in the congestion control process and depends on the capability of the intermediate nodes, it supports the both hop-by-hop backpressure and cross-hop backpressure. This flexibility makes the compatibility with the existing network device for the in the practical deployments.
And how the PFC and FGFC interworking from each other? If- when the DCN congestion, it triggers the backpressure on nodes in WAN. For example, when the congestion occurs at data center B, the data center B will send the standard PFC frames to the edge node R5, and the R5 will buffer the affected traffic, and when it reaches a buffer threshold, it will generate FGFC packet with priority AF1, slice ID, and pause time, etc., to the upstream node RX. And if the RX cannot resolve the congestion, it will continue sending the backpressure message upstream. Conversely, from the FGFC to PFC, when the WAN is congested, it triggers the backpressure in DCN. For example, the congestion occurs at edge- at node RX, and the RX in the specific slice with the priority AF1, and the RX will send the FGFC packet to the edge node R1 with the AF1, slice ID, and pause time. The edge node R1 will buffer the matching tenant traffic and when it reaches a buffer threshold, the edge node R1 generate the standard PFC frames for the data center A. Then the data center A will stop all the affected traffic at corresponding port.
As the edge node plays an important role for the seamless FGFC and PFC collaboration, below are the requirements: first, the coordination and bidirectional translation between the WAN and DCN flow control, including the protocol conversion, semantic mapping, and policy alignment, and native support for the network slicing. Second, effectively respond to the DCN PFC frames. The edge node need to learn the flow-to-port mappings and manage the buffer resource and generate the FGFC packets for WAN. The third, effectively respond to the FGFC message in WAN. The edge node use the flow and port mapping relationship to determine the target DCN ports and DCN points, sorry, and generate standard PFC frames for the DCN. These requirement specify the edge node capability and guiding the device implementation. Besides we have- we have also done several trials for validation. In Hangzhou Xinhua, we deployed storage and computing separation AI training over WAN. And in Shanghai, we deployed distributed inference using the model splitting learning. This trials all achieved over 90%, 95% computational efficiency compared to the centralized way. And for the both trials, used FGFC in WAN and PFC in AIDC to guarantee the RDMA performance over long distance. And the gateway devices matched the requirements mentioned above for the collab- collaborative deployment successfully achieving the end-to-end flow control.
For next- for next step, we will- we are seeking for more reviews and comments and we will maintain alignment with FAN-Tell work and related congestion control drafts. Thanks for your attention.
Jeff Tantsura: Very interesting work, thank you. I have one request for the draft. As you know, buffer space is required to hold pause message. And there's a formula to calculate buffer space per distance per traffic class. I believe it's 1 meg per 20 kilometers, unfortunately. If you could include table on memory requirements based on the distance in the WAN that would be very useful for the draft.
Hanjunxin: You mean could I put the whole- Oh, sorry, could you explain- you mean that I put one... for the buffer, buffer headroom or...
Jeff Tantsura: Yeah, based on the distance on the WAN links.
Hanjunxin: Oh, so because the distance like the our trials is 300 kilometers, so the yeah we we need a big buffer headroom for the device, yeah. And we as a as a network operators, we have seen several new devices that with a large buffer. Yeah.
Jeff Tantsura: If you could include table on memory requirements based on the distance in the draft that would be very helpful for people who read to understand it.
Hanjunxin: Okay, thanks for your suggestion. At least put some information maybe as an appendix if that's, you know, you don't-
Hanjunxin: Okay, we will improve our draft. Thanks for your yeah, suggestion.
Jeff Tantsura: Thank you.
Slide: 08-Multicast Use Cases for Large Language Model Synchronization
Yisong Liu: Good morning, everyone. I'm Yisong Liu from China Mobile. And today I will present the multicast use cases for the LLM synchronization on behalf of my co-authors.
Firstly, I'll introduce the scenario of the LLM synchronization in the inference clouds. As we see in the fig, the multi- multi-cloud LLM synchronization is that the centralized model repository automatically replicated and synchronize the LLMs to the distributed GPU clouds platforms. And every platform could have local storage. And maybe the GPU clouds in a different regions or even in different operator networks. So, the emergent inference cloud service will deliver the large-scale real-time inference, fine-tuning, and model optimization services on the GPU cloud platforms.
So, for for this scenario, we have two challenges. The first one is the data high concurrency. So, a popular large model with the size maybe 20 gigabytes to the 1 terabytes that may be a download simultaneously across dozens of GPU clouds. And this will lead to the input or output bottlenecks at the storage repository, which will delay the model distribution at scale. And the second challenge is that the cold start latency. So the inference service cannot start until the model is fully downloaded to the cloud GPU cloud. And the low download efficiency leads to the significant cold start latency. It will delay the user to access the the inference service. So, although this synchronization process is separate from the training and the inference procedure, but it will directly affect the efficiency and the reliability of the inference service delivery.
So why multicast is in- is needed in this scenario? Firstly, synchronization the large- the large models to the multiple GPU clouds is a typical multicast use cases. Obviously is a is a point-to-multipoint pattern. And the second is that the- it will reduce the IO bottlenecks from the simultaneous downloads, improve the transmission efficiency, and minimize the cold start latency. The third one is that the GPU clouds span the multiple regions even multiple operator networks. So multicast technologies capable of operate across the different works including the core and metro networks, that is it was required.
So for this requirement, we have some candidate multicast technologies. We have a very brief analysis here. For the traditional PIM-SM, PIM-SM requests a multicast tree to be established in advance. So all nodes along the tree for every replication parts will maintain the multicast flow information. and it will- it is slow to the response to the network topology changes, but it can be improved by the fast reroute mechanism like the MOFRR. And it is suitable for the scenarios where the the set of the destination GPU clouds is relatively the fixed. For the SR-P2MP mechanism, this mechanism will relies on the controller to implement the multicast traffic engineering. And the replicating nodes, it means that their replication state of- the multicast tunnel must be established beforehand. And it is also slow to respond the network topology changes, but it can be improved by the also by the fast rerouting mechanism. And it is suitable for scenarios also for the destination GPU cloud is relatively fixed. For the Beer, Beer is a- as we know, is a stateless multicast technology and no need to establish a multicast tree, and it will response quickly to the network topology changes. And no requirement that the the destination GPU cloud set to be fixed.
So we just initiated the the use case and the discussion of this scenario and we want more detailed requirements and the potential gaps discussion from the working group. Thank you.
Jeff Tantsura: Any questions, comments? No. Thank you, Yisong.
Yisong Liu: Thank you. If you have any questions and comments, please send to the multicast to AI mailing list. Thank you.
Slide: 09-Requirements and Gap Analysis of Multicast in AI Data Centers
Jian Zhang: Hello, hello everyone. Could you hear me?
Jeff Tantsura: Yes, we can hear you. You have control of the slides. Please get started.
Jian Zhang: Yeah, thank you. Hello, everyone. I'm Jian Zhang from China Mobile and I'm going to present our new draft "Requirements and Gaps Analysis of the Multicast in AI Data Centers".
First, let's see the multicast use cases in AI data centers. As we know, the mixture of the experts is a popular model architectures for LLM training and inference, and it's replaced traditional dense models to buy sparse models with the dynamically activated experts. And we let's see a macro process of the MOE layers. After computed by the attention networks, a token is dispatched to multiple selected experts decided by a gating network. And each tokens may be routed to different experts. And with the expert parallelism in distributed LLM training or the influence, the token dispatch process may be implemented across different GPUs connected by networks. And the token dispatch process follows the point to point MP P2MP traffic patterns, which it's a typical multicast use case.
It's another- another case is for the LLM training, all-reduce is a core collective communication primitives, and the- the all-reduce process can be decomposed into the reduce and broadcast phase, where the broadcast phase exhibits a P2MP traffic patterns naturally matching the multicast semantics.
And the task in the AI data centers, besides the model training and the influence, there also the task in of the such as model distributions before the jobs starts and the checkpointing saving during the model trainings, and this traffic generated storage networks on the AIDCs. And for the model distributions, the it's to distribute the model parameters from the storage nodes to computing nodes- many computing nodes. And for the checkpoint saving, it's typically written to multiple storage targets for the ensure to ensure the reliability, so it's are also some multicast use cases.
And from the above- and from the above multicast use cases, we can see that AIDCs exhibit numerous P2MP traffic patterns. And inside of the DC, many unicast to implement P2MP communications. The using network communication to offload data replication from the servers to the network devices to implement P2MP communications has many benefits such as for the processing and latency, multicast can reduce the processing overhead and the latencies of the source GPUs and source NICs. And then they can enhance scalability by reducing the source side overhead. And as for transmission and the bandwidth, multicast can eliminate redundant transmission from source, and improves receiver side throughput by reducing the bandwidth pressure as the source.
And now let's turn to some multicast requirements in AI data centers. The first one, interactivity. As we know, P2MP forwarding and data replication is the fundamental functions of the multicast. But for the multicast in the AI data centers, it are supposed to support the efficient multiple to point MP2P forwarding, particularly for feedbacks such as acknowledgment and congestion signals, to meet the ultra-high reliability requirement of the AI data center networks. And to avoid the feedback storms with numerous receivers, the MP2P forwarding are supposed to coordinaries with MP2P packet aggregations. The next requirement is a reliability. As we just mentioned, the networks in AIDCs supposed to stay the lossless under the normal conditions without network hardware failures. But when the network meet- meet network failures such as a link failures, the multicast mechanisms are supposed to support the fast failure detection and recoveries. And the fault handling mechanisms are supposed to confine the impact of the failure to the smallest possible of the parts instead of the entire multicast trees. And for the dynamics requirements, the multicast should adapt to the dynamics multicast memberships changes in microsecond in the scenarios like the MOE token dispatching. And the multicast should enable the instant forwarding based on the real-time multicast member sections without needs of the predefined fixed groups. And the changes overhead should be low. And the next requirement is a sparseness. Like the as we know the MOE models is a sparse models and for example in the DeepSeek V3, there are nine experts out of the the 200 experts in totals. The a tokens activated a smallest parts of the experts in one time. So the multicast should be efficient for the sparse members identification and the states maintenance should be lightweight. And for to considering the ultra-high performance requirements of the AI data center networks, the multicast should be simplicity with the simple control planes and efficient data planes.
And there is the gap analysis of the typical existing multicast mechanisms with the requirement multicast requirements in the AI data centers. As for interactivity, there's no native support for the efficient backward MP2P forwarding or the packet aggregations handling the close look base feedback. And it's also last- less multicast to feel to meet the lossless requirement of the data center networks. And for the dynamics requirements, the Beer is relatively good, but for the sparseness requirement, the Beer's bit string overhead increases for weighting efficiencies in sparse member scenarios. And for simplicity, the mechanisms should be optimized for the data center's ultra-high performance. So, none of existing multicast techniques can satisfy all the multicast requirements in AI data centers. A new architecture or some deep enhancement to the current techniques are- are needed for AI data centers. And we seek feedback and welcome the discussions. Thanks for your attention.
Tony Przygienda: Tony, go ahead. Um, I hope you can hear me. Um, so I already wrote it in the chat, but so I like this envelope, the gap analysis, that's a very good graph. Um, Beer has a solution for the sparse stuff, but of course the size of the sparse group matters. There is a draft, you can look up, it's called U-BEER for uncompressed Beer or something like that. U-BEER. Thanks.
Jian Zhang: Yeah, thanks for your advice on the Beer. And maybe there are some sparse- sparse mechanisms we should also to to optimize for the ultra-high requirements in the AI data centers.
Jeff Tantsura: Any other questions? Comments? No. Thank you for the presentation.
Jian Zhang: Thank you.
Kemin Liang: Good morning, everyone. Thanks for the preparation by the from the chairs to present today in Shenzhen. I'm Kemin Liang from Xidian University. Uh, it's my great honor to present our work. Uh, my topic is Symmetry-driven Asynchronous Forwarding with Fast Reroute for Network Neo Networks. My talk is organized by five parts.
Micro-loops are generated during the IGP convergence process and are triggered by link failure and SPFA updates un- asynchronous and forwarding in- inconsistency and which can cause packet loss, congestion, reduce FRR effectiveness. The RFC 5715 confirms that micro-loops prevail in topology changes. This can cause congestion and repair starvation. So, the micro-loops is OS in the networks, but how this is start- starvation in the Neo networks. In Neo satellite networks, due to the more dynamic topology, longer propagator delay, limited onboard resources, so the micro-loops can cause non-negligible damages. There are two drivers. Firstly, planned disruption. Orbital motion can cause ISL changes and sun avoidance can cause laser link shutdown. Secondly, unplanned failure. High energy particles can cause router failures and software fault can trigger link failure. So, in networks, all these changes can lead more topology changes, frequent convergence and more micro-loops. So micro-loops have evolved from sparse radical incidents into routine events. This leads to a performance degradation in the Neo network. Given the challenge, let's think the existing approaches and onboard limitations. Current approaches such as LFA and SPF timing control are designed for terrestrial networks and are fail to fully account for the space environment and constrained resources in the Neo networks.
So, how can we give this problem in a in a another angel? Our motivation comes from the- from the topology of the Neo constellation. The main constellations such as the Walker Delta, Manhattan, and Twist Manhattan can be abstracted as Torus topology with rotational symmetry and reflection symmetry. As the picture shows, the Walker Delta constellation can map as a Torus topology constructed by ISL and can further simplified to a ring topology as a basic unit. So, the ring topology is a basic unit of the Torus topology. This means we can making rerouting and traffic protecting switching into a game grounded in symmetry.
The our STEP mechanism can can be explain by an example on ring topology. The mechanism is straightforward. Blockage can trigger reverse forward and same port trigger reverse egress. For the data packet send from node A to D, as the picture right shows, if there is a link error between C and D, then the packet can detour through B, A, G, F, E before arriving D without- without making micro-loops.
So how- how does the STEP working on a node? The the STEP have five procedures: the first, initializing, detecting, trigger RF, forwarding and translating, and then is the recovery. All these operations are driven by local state and conform symmetry rules on the local node without control plane to participate.
So the above is on a ring to introduce. So how does the- how does the STEP scale to Torus? We we proposed two policies: counter-first and network-first to lead the STEP scalable to Torus forwarding. And the counter-first is best for intra-orbit failures and the network-first is the best for inter-orbit failures.
Compare with existing FRR mechanisms, our proposed STEP have better performance in three dimensions: resource expenditure, convergence speed, and complexity. So we can show that this is a fresh approach to satellite routing from combating dynamics to leveraging topology.
We start our simulation on a 16x16 Torus with different fault probability. The simulation results shows that at the right- right figure demonstrate that the link failure probability- the packet loss rate change with the link failure probability and the node failure probability. We can see that the many methods LFA and R LFA are reduce packet loss, but our proposed network-first demonstrate better performance. And also and also we we have 16.5% reduce under link fault condition and 11.2% reduce under node failure conditions.
Then we- we have an demonstrate an our experiments under a five router ring using IPG hardware, using standard OSPFv3 with or without STEP. The experiments results shows that the STEP we proposed can reduce the peak congestion and can also reduce packet loss caused by micro-loop and black hole. Such as the peak- peak congestion reduced from 3,000 Mbps to 2,000 Mbps. To compare with the performance, we have a better packet loss and micro-packet loss black- and is better performance in black hole packet loss. So we- we propose STEP and under simulation and and and experiments to prove- to prove the performance better than the existing approaches.
And let me just summarize the proposal. The STEP is control plane independent and to leveraging the Torus symmetry for soon convergence. And compared with existing protocols, it is not modified the existing protocols, so it is easy to use. And that's all. Thank you.
Jeff Tantsura: Questions? Comments? No. Any questions? Okay. Thank you.
Slide: 11-Congestion Control Based on SRv6 Path
Yisong Liu: Hello everyone, I'm Yisong Liu from China Mobile. And this time I'll present the congestion control based on the SRv6 path. This is the second presentation of this draft.
I- I will give a very quick view of the problem, the background code in this draft. And the first is that the in-process the congestion notification in WAN. So, as we know, the traditional PFC relies on the Ethernet multicast. So in the WAN, we have more complex meshed topologies, so the multicast signal is not proper in this scenario. The second challenge is that the long latency SRv6 path. As we know, the end-to-end SR path can span large geographic distance, maybe thousands of kilometers. So it will introduce the significant unavoidable latency. The third challenge is that the control overhead at the SRv6 head node. As we know that a single head node maybe will establish numerous SRv6 paths across the WAN. So there is some centralized bottleneck risk in the SRv6 head node if we make the head node to deal with the congestion control.
So, that's our SRv6 congestion control proposal for the hop-by-hop solution. The head node in this fig, PE1, PE1 encapsulates the sid-list from the PE1, PE2, PE3, and PE4. So, each node will forward the traffic using the local sid-lists tables, verify the slice related info. When the congestion happens, for example for the PE3, it will detect the priority queue buffer overload, and it will send the congestion notification message to the upstream node, in this fig is PE2. And it will contain the priority queue where the congestion occurs and congestion control parameter information. This information maybe include the pause time and the target bandwidth and must be- contain the slice ID, okay. For the receiving node like the PE2 will receive the notification, and it will reduce the forwarding rate or just a pause the traffic for this slice. But maybe the congestion cannot be suppressed in this node, and it will detect the local priority queue overload again, so it will send the notification to its upstream node. This process can happen- can run hop-by-hop until to the head node. And for the head node, we can use the path rebalancing or the alternative path selection for SRv6 itself. And maybe it will notify the congestion control signal to the DCN, that's another story.
So, update from the previous draft, we have simplified message format because in- in previous- in previous draft, we had- we put the SRH in the packet format, but maybe some situation like the binding sid, like the c-sid, it's difficult to distinguish which upstream node we will send the notification message to. So, we just give a very simple message format for notification. And for next step, we will add more description, add more procedure for how to notify the path information. So, this is just the congestion notification message format. And it will contain the flags, flags now not defined for now, but in the future, maybe can we can define some special flags. And the priority for the queue priority identifier, each priority queue can occupies one bit. And for the argument for the congestion control parameter information, like for the pause time for the default parameter, in the future, we can define the forwarding rate want to reduce, and target bandwidth indicate the target bandwidth information for the expectation suppression. And the slice ID means the identifier for the slice experienced the congestion.
So we want seek more review and feedback from the working group. That's all. Thank you.
Jeff Tantsura: Any comments? Questions? All the discussion happening in the chat. Someone in the queue- Yixuan.
Yixuan Wang: Yixuan Wang from Tsinghua University. Thank you for your presentation. And here's the question. One of the motivation is that a single head node may responsible for managing numerous SRv6 path. And placing congestion control at the intermediate nodes can elevate the load on the head node. However, have you considered a scenario where multiple downstream nodes send the congestion notification message back simultaneously and receive it at the same upstream node? Could this lead to increased load or excessive congestion control on one of the upstream nodes?
Yisong Liu: We focus on the SRv6 path and the- the congestion control notification just notified one hop by one hop. I- I don't understand why you say that the multi- multiple downstream nodes. Maybe repeat your question.
Yixuan Wang: Oh, this scenario only for one- for one specific upstream node is there only one downstream node?
Yisong Liu: Every downstream node only send the notification to the upstream node directly. Each- each.
Yixuan Wang: So you mean there is no possibility that multiple downstream nodes send the congestion notification message to the upstream node simultaneously?
Yisong Liu: Every downstream node only send the notification to the upstream node directly. So it's- it- it- each- just it just to itself.
Yixuan Wang: Okay, understand. Thank you.
Jeff Tantsura: Thank you. Any other questions? Okay. Thank you.
So, given kind of focus and amount of work spent on AIDC, I think we should be considering interim before Vienna. A lot of activity, a lot of changes, so we'll notify you accordingly. So for today's meeting we have finished all the presentations on the agenda ahead of time. So this we did have a small buffer, so now we have the open mic time. If you have any questions, comments to the working group or any future discussion topics, please speak up. Wake up, it's just Thursday.
So, thank you everyone for attending and we'll see you in Vienna. Or hopefully on interim sometime in between. Thanks to David for taking the notes. For people who spoke at the mic, please check the notes to make sure your what your comments was correctly recorded. And we'll see you in Vienna. If you have some idea for interim or something, talk to us. Thank you. And there's an fascinating discussion on the list- I mean, not on the list on the chat. I think we should bring it to the list and continue there. It's a lot of really good points and I mean, a lot of us have been long enough to have seen similar stuff long ago, so please continue on the list. Thank you.