Markdown Version

Session Date/Time: 17 Mar 2026 06:00

Susan Hares: ...minutes of requests, so we've packed in the agenda. So, I hope I'm going to breeze through this status as quick as I can. Please read the Note Well. You're bound by it. It talks about the fact that you need to be professional, that you need to know that when you participate, you have IPR constraints, and it needs to know that this is a procedure that is forwarding the internet goals. Read the details yourself.

Okay, the status, IDR status is on GitHub. Look at it. It's probably interesting more than I am, you know, you can read the status. But I'm going to give you the two-minute brief detail. We sent to the IESG the draft-ietf-idr-vpn-ora. We are in the middle of post-IDR shepherd write-ups on six additional drafts. One is draft-ietf-idr-sr-policy-ifit and R-P, which I expect to go quickly, so with draft-ietf-idr-nhc and derived community. The other three will take a little tweaks, but we have every confidence that within a month we'll have another three at the IESG. Bon appétit, Keyur, for reading.

Okay, I'm not... there it goes. But what we really need your help on, if you want an adoption call for new things, please, please comment on these existing calls because we need to finish them up in order to get to adoption. So, please comment. Even if it isn't your draft, the best way to get your adoption draft out is to help us get the Working Group Last Call that we work in order of post-Working Group Last Call to the IESG, Working Group Last Call, adoption. If you want your adoption, please comment. That's my plea.

This is the schedule. The schedule for today is on the web, so the rest, follow along as you can. We have a data center focus in the beginning. After that, we have flexible inter-domain algorithms for SRv6. After that, we'll have a bit of multicast from Jeffrey Zhang, and then we're going to swap over to SR-SID list optimization with Zafar. Zafar, are you here online or are you here in person? Okay, we'll find out. Hopefully, it's fun. And then last, we have security. At the end of the day, while you after you've got all your coffee, we're going to have a charter discussion. Now, brief news, and then the rest of the week I'm going to hold on Wednesday—that's tomorrow morning—editing for those who have flowspec v2 drafts because we expect in April and May to finalize on the base draft and then start adopting, and we have a lot of adoptions. And on Friday, we'll have a core—in the core area, there's a proposal for an interface capability on the open. Multicast information, SRTE drafts, BGP-LS drafts, NRP exceptions, SAVNET, BGP-LS, and Flowspec. So, you don't want to miss the week; it's going to be full of fun.

Okay, charter. Maybe you get bored in the middle of somebody's talk. Might be mine, you know. But there's the charter. Look at it. The style of the charter is different. Every IESG is different, in case you didn't know that. Different group of people, different group of requirements, so they make different charters for us. So, this one has a detailed... five, six years ago they wanted less detail. I don't care. We just give them what they'd like. And what doesn't change: the two implementations isn't changing, the work we're doing isn't changing. They're just giving you a wonderful detailed list. But read it, maybe it'll be fun.

So, one concern we'll talk about because lots of you have come up and talked to me about is how does IDR maintain a good BGP protocol with scaling and interoperability when a lot of AFI/SAFIs shove information through it. So, I've had the comments there on the bottom: "This information has nothing to do with routing," "Why is it in BGP?" "Why do they get to break BGP and we suffer?" This is the time to expose your comments. The work hasn't really changed, but if there's something we need to know, this is the time you can talk to me, you can talk to Keyur, you can get online and talk to our AD, or Jeff. Okay. Next is Kevin. It's all your turn.

Keyur Patel: Do you want to use...

Kevin Wang: Just a minute. Do we have it up? Good afternoon. My name is Kevin Wang. I'm from Juniper Networks, which is now part of HPE.

Susan Hares: Okay, I'll get it. Hold on.

Kevin Wang: And the draft I'm going to talk about is BGP Deterministic Path Forwarding (DPF). It's this one. Right.

Susan Hares: Yeah. Okay, just a minute.

Kevin Wang: All right, I think I'm good with this slide. You have to give him the clicker. Yeah.

Susan Hares: It's back. Now it's working.

Kevin Wang: Okay. All right. As we know, today's data center usually use the Clos topology with IP forwarding, and usually BGP is the only used routing protocol. So, this design is simple, is designed simple for a reason, just because for reduce the cost, maximize the inter-op. But whenever you go simple, there's always some limitation. So here, in the simple design, all of the flows, you don't really have control. It is usually randomly hashed to maximize the ECMP at every stage. And BGP DPF, on other side, is trying to separate a physical IP fabric into multiple logical fabrics or colored fabrics. And then we can map flows to different colored fabric to achieve load balancing or differentiated SLA or avoid fate sharing. It is a lightweight traffic engineering for IP fabrics.

To further explain DPF, let me use a few use cases to see how we can use it. The first use case is called Q-pair pinning. This is a use case in the AI/ML fabric. Where today, when one GPU want to send data towards another GPU, we use usually use the RDMA, and ROCEv2 can be used to send those flows. And when you send a big chunk of data, we also allow them to partition the big chunk of data into N number of equal amount of smaller chunk for better granularity. So, each each chunk is identified by a Q-pair in the ROCEv2 header. So, if we map different flows into different colored fabric, in theory, we can make sure each of the colored fabric has perfect load balancing. So this will help avoid the congestion proactively.

A second use case is for the multi-tenancy for GPU as a service. Let's say we have a red tenant and a blue tenant, and they may have different SLA requirements, and they may pay different prices. So, let's say the fabric is partitioned into a red fabric and a blue fabric, and we map the red tenant towards the red fabric, the blue tenant towards the blue fabric. This way, their traffic will be isolated, and they will not interact with each other, and we could achieve different SLAs for different categories of tenant.

There are also cases where people may want to avoid fate sharing. One such example is in the again in the AI/ML fabric. You may want a exception flow towards the green fabric while your data flow towards the red and blue colored colored fabrics, so that your exception flow don't share the same fate as your data flows. The other use case is in the factories. People usually use the PRP, which stands for Parallel Redundant Protocol, to control those robots. And what PRP do is it try to replicate each packet into two duplicate two identical copies for redundancy, in case one copy is lost, your robot is still working. However, this kind of redundancy only works if you can make sure both copies go through disjoint paths. So, in this example, if we map one copy towards the blue fabric and one copy over the red fabric, you would have helped to maximize the redundancy.

The way we color the EBGP session—no, the way we color the fabric is through coloring the EBGP sessions. And we attach a color community for the EBGP session. So, if we want to color the link from Spine 1 to Leaf 2 as red, then we just color the EBGP session on top of it with a red color community. And then after the session is colored, it only allows routes with the matching color to go through. The only exception is for those uncolored routes, which is usually used for the protocol control packet, that can match any fabric.

There are two modes when we color the session. The first mode is strict mode, where we use a session color capability in the BGP OPEN message to negotiate. So, let's say if you configure a red session as red on one end and accidentally configured as blue on the other side, so the negotiation would fail, and we would not allow the session to come up. So, this works good because you detect the misconfiguration earlier. But the disadvantage is, like with today's negotiation, you would have when whenever you change color, you need a new OPEN message, you have to flap the BGP session, unless you do the dynamic session capability, which is not so popular today. So to address this, we also allow a loose mode, where you do allow those misconfigured sessions to come up, but rather you rely on the route color mapping matching to detect the misconfiguration. Let's say again, if it's configured as red on one end and blue on the other end, so a red route would be allowed to advertise on the red end, but once it's received on the blue end, it will be rejected.

So, once the sessions are colored, now we can color the routes so that you can specify the intent of how do you want to send the route. The simplest case is like each route just assign one color. In this case, I have two routes, one assigned with red, the other assigned with blue. And the red route will be sent over the red sessions only, and the blue route will be sent over the blue sessions only. So, at the ingress, you would see like the red route is going through the red spines only, and the blue route is going through the blue spines only.

Under certain cases, people may want like, in case my red fabric fails, I don't want the packet just to be dropped, right? So, in this case, you can specify a primary color and a backup color. In this case, the route on Leaf 2 is specified as red as primary and green as the backup. So, what we do is we send the red we send the route over the red session with the red community. In addition, we also send the route over the green session with the green community. And we also use a AIGP with metric 0 starting from Leaf 2 to identify to tell the ingress or intermediate node that I want red to be my primary. So, in this case, on Leaf 1, when both paths are received, it will know that red is my primary and green is the backup.

There are also cases people may not want just a single color as a backup. The reason is because when your primary fabric fails, suddenly all of the flow used to go through the primary fabric now will be dumped to a single backup fabric, which might easily cause congestion on that backup fabric. So to avoid this, we have a mode where you can specify one color as the red and you specify the rest of colors as a wildcard as backup. So in this case, the route on Leaf 2 is sent from the red session with red community, and a AIGP of 0, and it's also sent over the green, the blue, and the purple fabric with the corresponding communities. So now at the ingress, you would see the primary path is just one color, while the backup path is a ECMP over three colors. So in case the red fabric fails, the red flows will be evenly, or at least like intended to be evenly hashed into all of the rest of three fabrics, so that each of them only take a fair share of backup without causing congestion.

The last mode is you could also signal a route with all colors, which is very simple like it's sent over each of the colored session with the color community. And at the ingress, you would see a ECMP, which is not very different from today's best effort route, except that it clearly tells you the color of each next hop. So, the ingress could use tools like firewall filter to decide which color to map which flow and with the rest color as the backup.

Yeah, this will end my presentation. I'm able to finish it. So as I said, this draft has just been published last December, so it's currently version 00. We... Jeff, do you want to use the clicker thing? I've got Nan Nan in front of you on the queue. Nan, did you want to get to the mic? You're in the queue. We got some comments, and which is already which is going to be incorporated into the next version, version 01, after this IETF. In the meantime, we do welcome comments as well as contributions. Thank you.

Nan Geng: Hello, Nan Geng from Huawei Technology. Thanks for your introduction. Just a small question. Maybe there are already some routing policies on our routing nodes. So, are there any influence on the existing routing policies if we use the color mechanism? Thank you.

Kevin Wang: Yeah, very good question. So first of all, the intent of this is like, as I said, this kind of data centers usually people want to keep it as simple as possible. So the intent is to minimize policy. However, if you do have policy and this color, the policy would take the precedence over this coloring. So I may color a route with red, but if you use a explicit policy to modify it as green, then your policy will take precedence.

Susan Hares: Next, next we have Sandy, I think it's Sandy Zhang. Yep.

Sandy Zhang: Sandy Zhang, ZTE. It looks to me it's like introduce multi-domain, multi-plane in BGP. So, from my personal perspective, I think that if we can only control it by the extended community for the route, not for the session. So maybe it's more flexible. Please consider about it.

Kevin Wang: And let me try to see whether I got the question. So, you do know that we color both the session and the route, right? So you are what is your suggestion? Can you repeat it?

Sandy Zhang: So, because because maybe I see the first version of this draft, it looks like all the BGP session can be built by color community, right?

Kevin Wang: Right.

Sandy Zhang: Yeah. Then my question is that maybe we can just consider for routing restriction for the contribution or just computation, not for the session rebuild re-establishment. Yeah.

Kevin Wang: Yes, that's what I'm wondering whether what you are talking about is like the loose mode where we don't control the session establishment based on the the color, right? So we do allow you to come up, even if it's mismatching, but instead, we just map your route towards the the color of the session on both end.

Sandy Zhang: Okay. It looks like in the strict mode, strict mode, the session will be controlled by the color, right?

Kevin Wang: Yeah, that's correct.

Sandy Zhang: Yeah. So that's optional, and we can choose to not to broken the establishment between the BGP peers, right?

Kevin Wang: Exactly.

Sandy Zhang: Okay. Thank you.

Kevin Wang: So and also in the future, I know like we have a pretty old draft talking about dynamic session capability. So once that's getting more popular among vendors, we could use that to dynamically negotiate without flapping the session. Thank you.

Susan Hares: Jeff, it's your turn. I took myself out of the queue.

Jeff Tantsura: Jeff Tantsura, Nvidia. The backup, is it a BGP construct data plane?

Kevin Wang: Jeff, you need to back up a little from the mic. It's over it's a little hot.

Jeff Tantsura: The backup route, is it implemented as construct in BGP or you need to run best path to use backup routes or you download all of them, you create two next-hop groups and one is primary, one is backup? Where is it implemented?

Kevin Wang: You mean like this this example, right? You have a primary and backup, so where is it implemented? Well, I think most of the routers today, they do have the capability to have like FRR, right? You do have the capability as have a primary next hop and backup next hop. So if that that is your question, it's implemented in the routing protocol daemon itself. Of course, different vendors have different details.

Jeff Tantsura: No, some more specific question. Would you accept all colors in BGP? They would be installed lower as same cost and then due to difference in community you'll create two different next-hop groups? Is it a correct understanding?

Kevin Wang: Okay, I see your point. That is very much implementation detail, and even to be honest, even within HP or Juniper, we have different ideas.

Susan Hares: Okay, I'm going to have to ask you to take this to list. I'll ask you to answer why with all on the list something for me since we don't have time. Would you answer why you went to a router open capability when we have a lot of other color mechanisms? Just why you went that direction. Okay? I'll send the question to the list.

Kevin Wang: Sure.

Jeff Tantsura: Can I just 20 seconds? I think you need to be very specific because in first case you are going to advertise those routes to your peers, in second case you won't. So it has to be in standard track document. You cannot just keep it as private detail.

Kevin Wang: All right. I will talk to you offline about this.

Susan Hares: You need to take this back. I am done. Okay, we're going to go to Jeff Haas's slides. Thank you. Jeff, you should have control.

Jeff Haas: Okay, good morning. This will be a very brief update on the BGP YANG model, draft-ietf-idr-bgp-model. And I'm waiting for the clicker control to be passed over. Otherwise, you could just simply hit next if you want to. Do you have it or not? I can I can do it for you.

Okay. So, our prior status: we had last gone through a Last Call and a bunch of edits back in April 2023. We ended with weak support. YANG models are, you know, very complex and very difficult to review, so this is not terribly much of a surprise. We did publish version 18 in October to try to address the bit of calls that we actually had. And we moved into waiting for implementation because we normally require two implementations to do things. That said, you know, why are we doing anything at all? Well, the Broadband Forum liaison asked us back in November that they wanted to make use of the BGP model to do some of their own work. We also have seen very recently another draft from NMOP, you know, that will want to also try to make use of this work. And our current AD basically said, "No, let's let's actually get this thing published." Well, this means we're shipping things without implementations. So, we spent some energy doing a bunch of last-minute cleanup: audit versus the existing OpenConfig models and a few other, you know, things. What this ended up doing was it resulting in an eye chart of about 48 different small items that were fixed across the, you know, last couple of weeks. So if you are care about the details beyond what actually shows up in the RFC diff, though, please feel free to take a look at the GitHub for the draft, and you can take a look at all these little micro-details as we work through the process.

Sort of the high-level important changes that most people would care about: there have actually been some reorganizations. Now, this would be considered a breaking change if this had been a published model. Mostly these changes were where some of the leaves are present. One of the things that we spent energy to get correct—thanks to Maria and Mahesh—is getting the regular expressions for communities updated. That should actually be addressed now. The filtering language that we published in the last version now actually has a ABNF, so it should be possible to implement a version of that if you don't have code that currently understands RFC 181-style AS path regular expressions. And for everything else, please see the data tracker diff.

What is left? Well, the, you know, draft-ietf-idr-bgp-rpki-yang that you saw from NMOP earlier on, basically it pointed out that we are using submodules, which have some level of controversy within the YANG community as to when they're appropriate to use. One of the side effects is they're effectively a private include within YANG, and a side effect of that is that you can't use them external to, you know, the namespace of the BGP module. The NMOP draft is looking to expose these things, so we have been encouraged to basically open that up. There will be a little bit of continuing discussion with authors from them to figure out exactly what work they're looking for and to quantify what's being done. And that set of conversations, along with the pressure from AD and the general community, will determine whether we ship this thing to RFC before addressing these things or not. Goal is to try to at least address the quick points.

Uh, past that is to decide whatever's left to review. And, you know, the sort of last point for myself: I have been working on this for several years at this point and am looking to, you know, stop having this, you know, haunting me. So I will be hanging up my hat for, you know, the IETF BGP YANG-related roles at the end of this draft once it's been published. We have maybe 30 seconds for questions if anybody has any. Otherwise, let's take it to the list.

Susan Hares: We have no one in the queue. Anyone would like to ask questions? Okay, we'll go on to the next speaker. Yep. Thanks, Sue.

Zhuang Rui: Good afternoon everyone. I am Zhuang Rui from China Mobile. Today I'm going to introduce our draft named BGP PORT EC for AIDC. So first, I will introduce the background. It is well known that in AI data centers, AI tasks are highly sensitive to congestion and packet loss caused by congestion. As on the network side, as we can see in the picture, the gray area, we have various methods to minimize the congestion and packet loss, such as packet spraying among multiple multiple equal-cost paths. But when it comes to the endpoint side, when multiple AI tasks are executed simultaneously on the last hop, as we can see in the picture, the link between the destination leaf switch and the destination server, there will be many traffic aggregated at the same hop, so this resulting in the resulting in serious packet loss due to congestion. So, we need effective method to avoid last-hop congestion and improve the efficiency of AI task execution.

In some implementations, to avoid last-hop congestion, the sending side will negotiate with the receiving side's leaf switch before sending traffic. For example, in the picture shown shown before, Server 3 needs to negotiate with Leaf 1 or Leaf 2 to send traffic to Server 1. This means the source server cannot send traffic until the receiving leaf switch allows it. Only after the receiving leaf switch grants permission, can the sending can the sending server begin sending traffic. According to the existed existed implementation, the sending side source server or leaf switch needs to know the port ID between the destination leaf switch and the server in advance, so as to negotiate before sending AI task traffic. So, we need to allow the sender to negotiate based on the information needed before sending traffic. To achieve this requirement, BGP needs to be extended to include port information on the destination leaf switch. This is also the main contribution of the main work of this draft.

So, here comes our proposed solution. The main contribution of this draft is to leverage BGP extended community attributes and we design a new subtype which carries the port ID between the leaf switch and the server. So, we can see on the left, there are two pictures. These two pictures give the formats of the BGP extended community attributes. There includes two kinds of format, including Transitive IPv4-Address-Specific Extended Community and Transitive IPv6-Address-Specific Extended Community. In those formats, the subtype it indicates that this is the Root Port ID Extended Community. And the Global Administrator, it is set to the IPv4 or IPv6 address of the switch that advertises the server route. This address can also be used as the loopback address for establishing the BGP connection. The Local Administrator is set to the ID of the port connecting to the switch and the server.

With such design, when the leaf switch advertises routes to the server, the advertisement includes the Root Port ID Extended Community, which is transmitted along with the route advertisement. Upon receiving the route carrying the Root Port ID Extended Community, the leaf switch will check if the address is reachable. If it is reachable, the extended community will be ignored. If it is reachable, the address and the port information will be stored in the will be stored locally or sent to the server. We should note that the storing or sending procedure is outside the scope of this draft. There are something we also need to know: first, the advertisement scope is limited to one port; second, because there are a large number of ECMP links in the network, so the ADD-PATH function needs to be enabled to ensure that the route is not aggregated and thus this information is not lost.

Here, we give a simple example. As we can see on the right, the picture, this picture shows the procedure of negotiation between Server 3 and Server 1. First, the routes advertised by Leaf 1 and Leaf 2 to Server 1 carries the Root Port ID Extended Community. Then, when Server 3 wants to send AI task traffic to Server 1, it should first negotiate with the leaf switches connected to Server 1, such as Leaf 1 and Leaf 2. If Leaf 1 is successfully negotiated, traffic will be sent to Server 1 through Leaf 1. If the negotiation with Leaf 1 fails, it will further negotiate with other leaf switches such as Leaf 2 in this picture. So, this means when a server wants to send large traffic of AI traffic of AI tasks, it will negotiate bandwidth with leaf switches first based on the destination switch and port information obtained from BGP. Then, traffic will only be sent after successful negotiation, so as to avoid packet loss caused by congestion, as we mentioned before, many-to-one congestion. This we should notice that the negotiation process is also out outside the scope of this draft.

So, that's the main work of the current version of this draft. In the future, we are planning to enrich and improve our draft. So, if any so if you have any valuable comments or valuable suggestions or have better ways to advertise this port ID, please feel free to suggest and discuss. Thank you very much.

Susan Hares: I don't see anyone in the queue. Would you like to add some... Krity, will you put yourself in the queue but come on in. Jeff, you're next. Just stand up. Did I not put myself in the queue? Yep, go ahead. Don't worry.

Krity: Um, on the problem statement slide, you have a statement that says "BGP needs to." I don't think BGP needs anything. Just saying.

Zhuang Rui: Thank you. This is just a... we do some work about the BGP extended community in our global Ethernet.

Krity: No, I understand the problem statement. I understand that the ingress might need to know what the port is, but it doesn't have to be BGP. That's all I'm saying.

Zhuang Rui: Yes, thanks for your evaluation. valuable suggestion. We also consider using some other valuable ways to to fulfill this requirement. Thank you.

Jeff Tantsura: Jeff Tantsura, Nvidia. I assume you'll be attaching community to redistributed subnet that identifies the link from leaf switch to the server. When you reach particular scale, you would really like to aggregate all the routes on the leaf, right? You go towards 100,000 endpoints, your routing table becomes unmanageable. How would you deal with it?

Zhuang Rui: Sorry, I didn't fully understand your question. Can you explain it more specific?

Jeff Tantsura: You attach community to a route, right? That's the vehicle to distribute communities.

Susan Hares: Can I ask you to both... I'll do it offline. Yeah, thank you. Maybe you can talk to him afterward and discuss that. We're just tight on time today. So, Jeff, if you would do that, I'd appreciate it. Sure. Just explain how you would aggregate routes while still using it. Thank you.

Xiao Hu: Hello everyone. I'm Xiao Hu from China Mobile Cloud. This talk covers brief update to Fully Adaptive Routing Ethernet using BGP. Major changes since the last version: adding a new co-author, adding a consideration about multihoming scenario. In previous version, there're no special consideration about multihoming. For example, in this topology shown in this figure, when Leaf 1 receive two ECMP routes from Spine 1, Spine 2 respectively, although those two ECMP routes associate with different path bandwidth values, Leaf 1 will calculate the same weight for both the routes. The reason is that Leaf 1 will simply determine the weight by calculating the minimum value between the bandwidth the link towards the advertising node and the path bandwidth value associated with the received route. Now with special consideration for multihoming, for the same topology, for the same ECMP routes, Leaf 1 will calculate different weight values for those two ECMP routes. It occurs because Leaf 1 will double the bandwidth link value the the bandwidth value the upstream link towards the advertising node before calculating the minimum value between the bandwidth of the upstream link and the path bandwidth value associated with the received route. In this way, the total bandwidth resource can be more efficiently utilized. This is very important for AI networks. Currently, there are at least two vendors have implemented this protocol. In addition, there have been at least four mainstream switch chips which can already support this protocol. The co-authors of this draft believe the current version of the draft is ready for Working Group adoption call. Any comments?

Stephane Litkowski: Yeah, Stephane from Cisco. I think this is a comment that I already made. What is the justification to create a new extended community to do that, while we have already multiple vendors that do that with the existing link bandwidth community? And the use case is already solved.

Xiao Hu: Okay. This is a good question. This question have been discussed many times when I negotiate with some vendors about this protocol. The reason for using or inventing this new extended community is when we build a five-stage Clos network rather than a three-stage Clos network, the single link bandwidth extended community is not enough to accurately identify the...

Stephane Litkowski: Why? It's just a carrier. You could put whatever value you want in it. It could reflect just the local links that you have, you can accumulate. All these things are already done. An extended community is just a carrier.

Xiao Hu: In a word, I have discussed with one co-author, the link bandwidth draft, and after discussion, he also admitted the necessity of this new path bandwidth extended community.

Stephane Litkowski: But look at the EBGP-DMZ draft; it already defines some of the use case using the link bandwidth community.

Xiao Hu: Okay. We can talk offline.

Stephane Litkowski: Yeah, absolutely. But my worry is really we will just overcomplexify things. You will end up into possibly receiving BGP updates with link bandwidth community plus this one, and you will have to do some magic between the two. Honestly, why?

Xiao Hu: Yeah, many vendors also don't realize the necessity of this new extended community. But after deep discussion, he's realize the single link bandwidth extended community is not enough, especially for the five-stage Clos network.

Jeff Tantsura: Jeff Tantsura. So, I 100% agree with Stephane's comment. I think you really need to provide justification. The DMZ community is deployed at huge scale already, it's deployed everywhere. Replacement is an expensive operation. I think it's also interesting, if you have both, how do they interact with each other, right? Because you need to migrate eventually. All of those need to be in the document because cost of change is really significant, right?

Xiao Hu: Okay, this talk just about a brief update of the previous the current version. If you are interested on the original motivation for the path bandwidth extended community, we can talk offline, okay?

Jeff Tantsura: No, it's not okay. If deployed at scale, if you want to propose something new, justify it.

Xiao Hu: You can imagine, if you can imagine whether you could use a single link bandwidth extended community for a five-stage Clos network. Okay.

Susan Hares: Oh, are you going... I'm sorry guys, are you going on to the next presentation? I wanted to have you not run out of line time. There you go. Okay.

Xiao Hu: Okay. This talk covers brief update to BGP Neighbor Discovery. Major changes include addition of new co-author and addition of new scenario: the use of OCS within data centers. The use of OCS in data centers is becoming popular, especially for AI scale-up networks. where the spine switches within a Leaf-Spine topology could be replaced by OCS devices. Reason for this choice include but not limited to a higher reliability and lower power consumption since there are no OEO transition and there are no optical modular on OCS devices. And it could support very good multi-tenancy isolation. Given the dynamic reconfiguration the links between leaf node in this topology, it would be beneficial to use a BGP neighbor discovery mechanism, especially for network automation purpose. Till now, there are two vendors have implemented this protocol. More implementations are on the way. Here I will show some details. You can see BGP neighbor discovery enabled on the adjacency interfaces of two adjacent devices. Loopback addresses used for BGP session establishment is workable. Route to peer's loopback addresses learned via the BGP Hello messages. The route to peer's loopback addresses are installed in the routing table. And now BGP session could be established with the discovered peer. Co-authors believe this draft is also ready for Working Group adoption call. Any comments?

Stephane Litkowski: Go ahead, Stephane. We're real tight on time. Yeah, just one quick comment and it's more for the chairs. Is IDR chartered to create protocols that are not BGP? Because it's helping BGP but it's a not BGP protocol.

Susan Hares: This one has a history when we had the when we were taking BGP auto-configuration. I don't know that we're going to be back in the business of doing BGP auto-configuration again.

Stephane Litkowski: Yeah, because especially as part of the discussion of the new charter, this needs to be clarified.

Susan Hares: Because it has a history, I let it be presented, but we don't have... we put that area of work away, so I'm not sure we can go ahead and adopt it. But it is appropriate for people to see it as it was something we've done in the past. Okay? I mean, that's where we are and guess we're in the middle of a charter discussion so you can vote with your charter discussion as well. Okay? Next, I I any other comments? We'll go on to the next slide. Thank you, Xiao Hu. Thank you.

Li Yan: Hello everyone. I'm Li Yan from China Mobile. and I'm glad to be here to present our draft on behalf of all the co-authors. and our draft is related to the BGP Extensions for Inter-Domain Flexible Algorithm with Segment Routing over IPv6 (SRv6). and it's applicable to the SRv6 data plane. and this is the first time for the presentation, so first I would like to introduce the background of this draft. IGP Flex-Algo provides a mechanism for the IGP nodes to calculate the path based on some constraints, such as low latency or high bandwidth. and SRv6 can serve as one of a data plane solution. and in SRv6 network, the traffic often cross multiple domains with the same constraints. and so even across the domains, the traffic or the path still needs to meet the same constraints to meet the some requirement. so this leads to the key point of our draft: how can we keep the Flex-Algo consistent across the domain? our document propose the BGP extension to advertise the SRv6 locators together with the associated algorithm across the AS boundaries.

Here is the typical scenario. from the picture we can see that the AS 1 and AS 2 belongs to the same network operator. and we need a low latency path between the router PE1 and PE2, maybe for some certain service. and in both domains, the Flex-Algo 128 is defined with the same metric, for example, the low latency, the latency. and in AS 2, the... oh sorry, there is a mistake on this slide. It should be the PE2 router. The PE2 router advertise the SRv6 locator through the IGP Flex-Algo and through the redistribution between the IGP and BGP, the PE1 router on in the AS 1 can learn this route. but the key point is that now the PE1 only learned this route from the IGP short path short path, and it does not know this route is related to the Flex-Algo 182.

So we extended a new community and to carry the algorithm information and with the prefix in the SR IPv6 Unicast NLRI and we call it the Flex-Algo Extended Community. and this community is a transitive opaque extended community. and the format is very simple too, and there is one 8-bit to fill the algorithm information and also we need a type low to define from the IANA. and this is the first version, so we appreciate all the comments and we will update the draft accordingly. Thank you.

Krishnaswamy: Yeah, Krishnaswamy, Cisco Systems. So have you looked at the RFC 9273? Right? So, you know, the color can be mapped to an algorithm. So do we really need an extended community to carry the algorithm information?

Li Yan: You mean the BGP CAR?

Krishnaswamy: No, not the CAR. The color prefix routing.

Li Yan: Oh, CPR.

Krishnaswamy: Yeah, CPR.

Li Yan: Yeah, yeah. the authors we have read this draft carefully before we propose this draft. we think that they are different. because this that RFC illustrate the situation of the policy, you know, the policy scenario. but it does not suitable for the Flex-Algo scenario, as we think.

Krishnaswamy: See, the thing is with the color you could map it to any algorithm, right? So why do you really need an one more method to carry the flex algorithm in the extended community?

Li Yan: Maybe color is one way, but I think that the CPR RFC cannot suit this scenario.

Yujia: Hi, I'm... sorry, can I first? Okay, thank you. and I'm Yujia from Zhongguancun Lab and I just have a question about the inter-domain semantics of the algorithm ID. because if we have two domains and both use the Flex-Algo but assign different meaning to the same ID, do you expect strict coordination of those IDs across domains or some form of policy translation at the AS boundary?

Li Yan: Thank you for your question. and it's a good question, I think. So first, I want to elaborate that here in this format, the algorithm is not only the algorithm ID. it can be the algorithm type as defined in the RFC if I didn't remember wrong, it's RFC 8665. it's a algorithm type, not only the ID. and you can choose which one to use. and second, I want to say that if the AS is under the same operator, usually we operate the same algorithm with the same metric type calculation algorithm. but yeah, you're right, it's not certain. it's not a certain way, I think. maybe it will be different. so you can use the algorithm time. or other, there is another way. if the, for example, the algorithm has been applied on the different AS, and you need to realize this function, you can static configure the mapping of the Flex-Algo to achieve this. there are many ways to realize that. Hope this can...

Stephane Litkowski: I'll be very fast. I was... we're running fast, but if you could make it brief... Yeah, yeah, yeah. In fact, what she was highlighting is really the reason why CPR was done. Because in a generic multi-domain, you cannot guarantee that the same algo type has the same meaning, so you need this kind of translation. But I fully agree, if you are in a kind of multi-domain under the same administration, an algo could work. Your proposal is okay, it's just that it's not generic, I would say, while the CPR one is generic. So if you want to continue, you need to refer to CPR telling that your proposal is an alternative, but it just works for one simple use case, which is a common administration. But CPR is really the ultimate solution to solve this.

Li Yan: Yeah, thank you. Understood. only quick answer. I think the CPR or the CAR is a general-purpose framework for the intent-aware routing, but our solution is more simple and it's a incremental method to just realize the algorithm scenario. thank you. If I didn't answer you very well, we can discuss it on the mailing list.

Susan Hares: I'm really you'll if you would continue this on the mailing list. We've got to go on to the next person. Thank you very much for presenting.

Jeffrey Zhang: Jeffrey, you gotta be tight today. Okay. All right. Um, so I'm presenting this BGP Signaling for Multipath Traffic Engineering Junction States on behalf of the co-authors, co-authors listed there. Um, so a brief summary of MPTE: it's a technology for creating a multipath traffic engineering tunnel from one or more ingress to one or more egresses. Its signaling is primarily from a designated ingress or signaling source to each node in the tunnel and in the backward direction. Um, the document, the draft so far, and this presentation will focus on the signaling the MPTE tunnels using BGP in environments where other signaling protocols may not be available. Um, there's more background in that draft draft-compel-te-mpte. Um, I'm going to quickly go some 20 seconds on this picture here. We have a tunnel from R1 to R12 going through those four red paths. If you look at R2, it has two previous hops to the same router using two links and three next hops—um, R3, R7, R9. Um, there is a percentage number there, just indicates the load share. Um, for example, 50% goes to R3 and 25% goes to R7, things like that. So the the signaling need to get the information to R2 and to every router on that tunnel.

So the way we do it, we use a new SAFI, we call it MPTE SAFI. The NLRI will encode the identification of that DAG, or another name for the tunnel, and the junction. And then some other informations like version or bandwidth, things like that. Um, for the previous hops and next hops, or P-hop, N-hop, um, those information we carry it in the Tunnel Encapsulation Attribute. The routes each route is targeted for a particular node. Um, that those routes are propagated by the BGP infrastructure, either hop-by-hop or via route reflectors. Um, the route target is used to target the routes in at a particular junction, and the propagation will stop once it reaches that target.

So, um, I want to give a quick background of Tunnel Encapsulation Attribute. It was originally designed to specify the encapsulation information for tunnels to the protocol next hop in a BGP update. If you look at this picture, we have two PEs, PE1 and PE2. Um, there are three tunnels from PE1 to PE2. Um, here I intentionally listed the Label Switched Path 1, a Segment Routing Path 1, and then GRE tunnel 1. Um, let's say the any of those tunnels could be used. So the PE2 put those information into that Tunnel Encapsulation Attribute, listing um, LSP 1, SR 1, and GRE 1. Um, so when the PE1 receive it, the traffic is load balanced or selectively put onto those into different tunnels. Now each tunnel is encapsulated is represented by a TLV in that attribute. Um, it has sub-TLVs um, um, to encode informations specific to that tunnel or to that tunnel encapsulation type.

Now, um, to use it for MPTE, we extended it so that it's no longer it no longer represents tunnels to the protocol next hop. Instead, it encodes one or more forwarding branches as "tunnels" um on multicast tree node or MPTE junction. Um, the extension was actually done before previously for the multicast purposes, we are basically reusing those extensions. Um, here the "tunnel" can represent a real tunnel to a remote destination, or can just represent an interface. Um, in the multicast case, it could be a incoming interface or it could be a outgoing interface. In the MPTE case, it could be a P-hop or an N-hop. Um, you can use different tunnel types for different encapsulations. For example, if you use MPLS for for MPTE, then it could be an MPLS label. You can use an IP tunnel for MPTE, it could be different kind of tunnel. So those extensions were already in specified in in the BGP Multicast Controller draft, which uses a different SAFI and for for multicast purposes.

Now, um, I want to clarify this new tunnel type we introduced. Um, previously we mentioned that um, different tunnels have have different encapsulations, MPLS tunnel or GRE tunnel, whatever. Here, sometimes one may not care the encapsulation you use. You can say you can it could be that you can use any kind of encapsulation, GRE tunnel, MPLS tunnel, to reach the tunnel destination. Um, or it could be that you're doing native forwarding out of a interface to your directly connected neighbor. That's why we introduced this tunnel type called "Any Encapsulation Tunnel", meaning that we don't care, as long as you can get my traffic to that node, that's fine. Um, the name may be um hard to follow, so we appreciate if you could have suggestions for a better name.

Um, then coming back to MPTE, um, we need an identifier in the data plane to say that which traffic this which tunnel or which DAG this traffic belongs to. Um, that data plane ID could be a label, could be a SID, or could be any kind other kind of data plane identifier, is carry the packet, just like a VPN label. Um, it's signaled as a sub-TLV in a tunnel in that Tunnel Encapsulation Attribute. Um, so for the MPLS case, we use a Tree Label Stack sub-TLV. Um, that's that was specified for multicast purposes and that's why the "tree label" uh uh the "tree" term was used there. Um, now for MPTE, we use the same for the for MPTE purpose because they are basically the same data plane identifier. So, we'd we could use a different name, better name, maybe "DAG Label Stack sub-TLV" or something, but that's just naming issue. Um, so why a label stack in instead of single label? In most cases, single label is enough is enough. There are cases that maybe you you want to use a label stack, and that's well explained in the multicast drafts. Um, notice that um even when you're using any encapsulation tunnel or even GRE tunnel or UDP to get the traffic to your downstream uh uh next hop node, you could still use a um uh use Any Encapsulation tunnel with the label stack inside it for MPLS data plane.

Um, RPF sub-TLV, um again it comes from the multicast work. Um, in the multicast case, one of the tunnels carried that RPF sub-TLV, which means Reverse Path Forwarding, indicates that this tunnel is really just a incoming interface. Um, this sub-TLV is just an indication with a length 0 um with no value part. We may change that to define a flag sub-TLV and use one bit in the flag field to indicate this this is RPF or upstream. Um, for MPTE, we use the same RPF sub-TLV or flag to indicate the P-hop. Um, so maybe we'll change it to upstream sub-TLV or upstream flag.

And the Weight sub-TLV. Remember earlier we I pointed out that for each N-hop, um next hop, there is a percentage to indicate the load share. 50% goes this way, 25% goes that way, so on and so forth. Um, the draft currently says that we are going to use the Weight sub-TLV. It turns out that now we realized that that Weight sub-TLV in the RFC 9830, um is specifically for SR policy segment list and it comes from different registry. we cannot really use it. We will need a new sub-TLV from the BGP Tunnel Encapsulation Attribute sub-TLV registry. We might also call it a load share sub-TLV, things like that. But the Weight sub-TLV, the RPF sub-TLV, and any encapsulation tunnel type, all those maybe we change the name, but still those are the main tunnel types or sub-TLVs that we need to use for the MPTE purposes.

Susan Hares: Jeffrey, we've hit the end of your time. Ah, why don't you give that slide and we'll call it quits. You're almost made it on on time.

Jeffrey Zhang: Yeah. So, this is the 00 version, explains the concept and describe the procedures. We still have quite some work to do, some details to put in and to give a very precise normative specifications, specification, yeah. Um, we'll continue that work. There are POC implementations going on, Junos-based and FRR-based. Um, so we appreciate your review, comments, and suggestions. Thanks.

Susan Hares: Um, I'm gonna have to cut it there. Thank you, Jeffrey. And by the way, this is tagged to some the uh some of the things that Jeffrey has described, Zafar I will have you come up. I'll just finish with Jeffrey. Um, Jeffrey, some of Jeffrey's work comes out of the BEST draft, draft-best-bgp-multicast-controller. Did I get the right name? Yep. You grab a mic as we stop someone else and grab it. Jeffrey's going to give you the name.

Jeffrey Zhang: Yeah, um draft-ietf-bess-bgp-multicast-controller.

Susan Hares: Okay. Thank you, Jeffrey. Zafar, it's your turn.

Zafar Ali: Hi everyone. Um, let me know if there's an issue with the audio. Um, my name is Zafar Ali. I'm from Cisco Systems. Um, I'm going to present this BGP SRv6 Policy SID List Optimization Advertisement on behalf of my co-authors. So I like to remind a little bit of a history. So this draft was presented three IETFs back um and um when it was presented, there was a feedback um and the feedback was really about need for SPRING to establish the requirement that this work is something that SPRING would like to do. So we were following following that discussion, we tried to do a draft in SPRING um that were presented in subsequent IETFs, IETF 123 and 124. And that draft is is now a adopted work in SPRING and is being published as a draft-ietf-spring-srv6-policy-resource-optimization.

So um just to give a very brief overview of that draft is that um is about segment SRv6 policies, and SRv6 policy SID list may end with either a node SID or a per-topology adjacency SID. When the SRv6 segment list ends with a node SID um as the last SID and a service um or some traffic that is steered over that policy also has node SID on the last SID um with the same intent um that carries the traffic there, then what happened is that the last node has to deal with back-to-back um local SIDs. One is for the policy endpoint node SID and one is for the let's say service. So this happened, for example, in the cases when you have SRv6 a service carried over SRv6 policies or you have binding SID locator that that carries the carries the traffic to the end node SIDs. Um, this is this was discussed in the SPRING and is not not not efficient from a compression efficiency point of view. Um, so um so basically what SPRING draft describe is the is the need um that um for work that the that a node an ingress node when it determine that the traffic is steered on that policy and takes it to the egress can skip or eliminate the node SID in installing the policy.

So the solution um for this draft is very very simple. This is about advertisement in BGP-LS that a node that has installed the policy and the ingress has actually skipped the last the skipped the node SID for the last node SID in installing the policy. And this is this is carried by a flag, and the flag goes under the candidate path TLV and that is within the BGP-LS advertisement. Um. So the changes are fairly simple. There was there was another comment in IETF 122 that there is there is a complementary PCEP extension and and and here like in IDR, so there was a comment about consistency across all the drafts, SPRING, PCE, and IDR drafts, which is which is being done as well. Um the the authors are not aware of any outstanding comment. Draft has been stable. So with that, the author would like the Working Group to have the work adopted to support the SPRING base draft. Thank you.

Susan Hares: Zafar, we've noted the adoption. I'm going to repeat what I said at the list earlier: we're trying to work through a number of drafts. The best way to get your draft adopted is comment on the other drafts on the list so we can move down the queue. Um, we will try to run four or five at a time, but we need to at least complete those first Working Group Last Calls. Any other questions, questions for Zafar? Any comment on this draft from the SPRING chairs since it is adopted? This brings me to a good thing to mention. Um, one of the things we've agreed with the SPRING chairs is that before you go through the adoption stage, uh that we're going to... no, come on up, come on up. I'm sorry.

One of the things, if you use a SPRING concept, we're going to ask the drafts that are adopted to put in a section that says what SPRING function you map to. And I'm going to I'll I'll send it out list, but we're going to ask for SR functions that you look at how it addresses SPRING and that how you interact with other features such as maybe you use a PCE. I will explain that on the list, but it's a good moment to add that. Please go ahead.

Zafar Ali: Okay. Thank you.

Susan Hares: Thank you. I'm Yangfei Guo from Zhongguancun Laboratory and I will introduce my work on BGP Communities for Security Policy Intent. and that's here is the agenda and let's begin with the background. um, the RPKI deployment is increasing, more than 40% AS have deployed RPKI and about 30 60% prefix have been covered by ROA. and route origin validation ROV for short will produce three outcomes including valid, not found, and invalid. but current security mechanism verify routing information but they don't express the security expectations of the original AS. that means um the for example if the source AS wants wants the downstream AS to drop the routes if the ROV result is not found or invalid but the downstream AS have no idea because it can't get the intent of the source AS. It will perform the routing decision by its local policy.

So our solution is to standardize a standardized a BGP large communities. It defines ROA strict. it means that if the downstream AS receives the routes, it can reject the routes if the RPKI ROV result is invalid or not found. Here is an example: the the route flow is from the left to the right and the source AS can add the ROA ROA strict to the routes and broadcast it to all its downstream AS. and the downstream AS can make the routing decision with this large community. and here is some use case. Um, the first one is a large content networks may need this because um when not found state is emerged, some attackers may hijack the routes. and the third is high-value targeted networks with ROA strict AS can reject the suspicious re-origination. and the third is BGP monitoring. It can help to confirm the alert event with high confidence.

And the last part is about security consideration. and the only thing need to be noted is the abuse of this mechanism. But fortunately, this is an a fail-close problem. It won't influence other networks and this is just an signal. and here is the conclusion. Um, this is a lightweight mechanism to show origin AS expectation. and that's all. Thank you. Any questions or comments?

Susan Hares: Stephane, you're first, Stephane. and then...

Stephane Litkowski: Yeah, I'll try to be quick. Um, isn't it something that should be discussed in SIDROPS?

Yangfei Guo: Uh, yes. But I haven't discussed there because I think this is a BGP large community.

Susan Hares: So, so this this we just had that discussion on the list this morning, so yes, it would be with the SIDROPS chairs. Charters are wonderful. Um, this will need to go to SIDROPS uh because we've identified that that set of protocols are under their control. But please give him feedback as well.

Stephane Litkowski: Yeah, so I will provide a feedback. I think you may have a good intent in what you are trying to propose, but each time we propose a new tool in a security area, it can be misused and create, I would say, additional security holes, I would say. So, I would say you can probably continue, and we need to discuss more in details, but we really need to keep that in mind that anything that we add could, I would say, create additional security issues for networks.

Yangfei Guo: Okay. Thank you.

Susan Hares: And I have a small question because you mentioned it have a out-of-band mechanism such as an RPKI object object and might eventually be a better fit. So how should we position this large community approach? Is it intended mainly as an incremental deployment step, or do you think it as the long-term mechanism for a signal origin security intent?

Yangfei Guo: Okay, thank you. because this is just an idea I haven't mentioned it in the previous I will discuss with you later.

Susan Hares: May I ask you to just discuss that? because right now we're out of time. Would that be okay? That would be wonderful. and you will need to apply to SIDROP for standardization in this area. Thank you. Okay.

It's my turn. Can you, do you think you can pull up the chair slides again? Okay. I left you with a set of questions at the beginning. Keyur's gonna bring them up. I'm sure we'll just go to the end. But this is the time when the AD who's online, the other chairs, um and uh we listen to what you do and do not want in the IDR charter. Stephane's given us a point that we should not go back into any BGP auto-configuration at a lower point. If you agree with that, uh that's something to let us know. That's not in the charter at this time. If you want it in the charter, it needs to be there. We had done that uh almost five years ago, and that work moved over to LSVR. Okay. The because there was more synergy there, there's an L3-DL, I think, right? Did I get the right name, Keyur? Yeah, L3-DL, since I'm a co-author. Um, so please, there's some work with that ongoing for LSR. Um, Xiao Hu's proposal didn't fit within that one, so uh there are other places. But please um give us feedback. If you're feeling shy, uh you don't have to speak at the mic. And now I'll leave it open for anyone who'd like to speak. The charter is online. I did give you a... let me give this down. Here's the charter. Please take a look at it. Putting the charter on the on the screen and squinting at it doesn't work as well as you just clicking on it. Um, it's a different style. But other than leaving the things we had already left out out, um it's pretty much the same. We tried to keep it the same. Um. Ketan, is there anything you'd like to say?

Ketan Talaulikar: Uh, yeah. I think uh just to clarify on what you said, the intent here has been to kind of take in all the adopted work that is there currently and all the work that is being done. Uh, it's just been listed out explicitly uh because as Sue mentioned, every IESG is different, and what we are seeing is that uh there are questions about whether the work is in charter or not being raised, and if that happens late when the document comes to IESG, it's it's a problem that we want to avoid. So that's really about it. So anything that we are doing that is not captured here, please bring it to our attention.

Susan Hares: Okay. I'll go back to some of the things people have said to me. Okay. How do we hand off... you notice that IDR is trying to be effective. Uh, you've got a lot of really good ideas and we're really grateful for all of them. We're trying to effectively process things through that have been uh adopted and implemented and adopt more in so we can help refine them and get stuff in. Please, uh we're still concerned. Uh, I think some of the SRv6 people, um one of the drafts was composite link, found out that I'm really interested in scaling and what happens. Um and we're all interested in interoperability as Stephane has talked or as Jeff has mentioned. If there is something already implemented, IDR tends to say, "Why don't you use what we've done?" That's usually one of the questions you will get when you get come to us. Anything else for the charter? We will sit here for a minute while you stare at your screen and see if there's anything. This is one time we're going to wait in quietness.

Jeff Haas: If you're online, just go ahead. Jump in on the mic. and there's a little bit of chat in the chat channel, a couple of the comments from uh channeling Dan Voyer was like uh IDR is a bit too wide and might be worth, you know, optimizing things to, you know, move some of the work elsewhere. Uh, the response I gave to him in the chat that you're covering Sue is that, you know, this is a shared technology. Uh, moving around who does the work at least makes IDR potentially move smoother in terms of what work is done where, but it doesn't necessarily address the problem of: this is a shared technology and simply moving around who does the work doesn't necessarily solve the "please don't break me." So, that piece of the issue needs to be addressed regardless of, you know, who does the work.

Susan Hares: This is one time we'd love to hear more comments, but uh if it's a process, uh if you are feeling like you'd like to think about it and send us notes, please do. Uh, what will it affect our daily life? Because we're not doing much uh that we're we haven't really made a dramatic change. So we're still just going to keep pushing forward to refine drafts, to ask questions. Go ahead, Stephane.

Stephane Litkowski: One very small question. Any deadline to close that? One more time. Any deadline? Do you have any target when you want the charter to be updated?

Susan Hares: Uh, Ketan, what's the deadline for the charter?

Ketan Talaulikar: Uh, so it's been we've been talking about this for three IETFs. Sooner would be better. Um, my concern is that some of the other documents that are coming up to IESG may not fit in the letter of the charter, of the current charter. So, I would say uh yesterday. But the key part is that we try to move forward with whatever we get, you know, kind of consensus or agreement on, and obviously we tweak things as we go. Uh, nothing is cast in stone here and now.

Stephane Litkowski: Yeah. So probably Ketan, you should, or the chairs should really put some deadline like we do, for instance, for adoption calls or anything, so you can gather feedback. If you don't have any deadline, people will just take their time, never provide feedback.

Susan Hares: Ketan, I think we'll talk offline and set a deadline. I tend to send, you know, a three-week deadline and say that's it, but you're my you're my AD so I'm sure you'll give me some wisdom. Okay! Well, in which case you've got back one whole minute of your life from IDR. Thank you very much for attending. Please, if you wanted to talk more at the mic and we had to really drive through this, uh please go ahead and talk to the person you wanted to discuss. Thank you very much. Oh, we will have a session on Friday as always, so please join that one. Uh, I believe it's the first session there, so there will be lots more drafts and lots of good SRv6, lots of good uh Flowspec, and lots of good other drafts.


Session Date/Time: 20 Mar 2026 01:00

Susan Hares: Okay, I'm first, so I think we're good.

Keyur Patel: Hey, checking my mic. Can you hear—can you hear me?

Speaker: Yes, we can hear you.

Susan Hares: We can hear you well, Keyur. Uh, Keyur, whatever the speakers are doing, they're interfering with the—my hearing aid, so if I miss something, make sure to speak up. Speak up. Okay? I'm going to start—I'm going to start with IDR status. We had a little bit of interesting information on early allocation, so I've added some early allocation slides, thanks to Bruno and Robert. So this is the status from Monday. The charter—we're working on a re-chartering. It's on GitHub. Our status is on GitHub. I just sent off—we sent off draft-ietf-idr-vpn-prefix-orf. Keyur, is—is this too hot? Should I be standing back, guys? Hot meaning it's too—sound. How's the sound? Okay, thank you. draft-ietf-idr-sr-policy-nrp is ready to go off, I just have to push the button. And draft-ietf-idr-nhc, Jeff—excuse me, John needs to make one more correction. The rest will come shortly as the shepherds work through them. And one that's probably the last is the CT SRv6, that takes a bit of spring. Okay. Need your comment. How many people are waiting for a draft to go into adoption? People here? Yeah. Well, if you're waiting, please answer the other calls, because we deal with the calls, we have to complete them, and then we go on. Okay. Seglist is in the middle. The seglist, why seglist is sitting there, draft-ietf-idr-sr-policy-seglist-id. When we went through with NRP, we did the working group last call last March, and then the post-process took us a year, okay? Why is that? Because we didn't carefully plan for the interactions with SPRING. So—

Okay. Early adoption cleanup. I'm going to restate some basics. No early allocation should be used in a released product unless it is assigned by IANA or it's a new registry specific to your draft. If you don't know what that is, talk to me. I'll be glad to walk through the process with you. But, Bruno and other people have noticed things in the wild. Why did that occur? Part of it's my fault, okay? RFC 9830 SR Policy was delayed based on lots of things, and people wanted to work. My error was trying to give some advice to implementers, realizing I thought you'd be working in your labs while we tried to get it done. That's really not the right way to do it, okay? I've been trying to slowly clean it up. We have a long-term thing, and by the way, Adrian warned me it was a bad idea, and I didn't listen to him carefully enough. So, mea culpa, or that says I apologize, it's my fault. Long-term, IANA is revising their policies. They're revising their policies for early allocation and even considering an early registry, so that if we had SR Policy and we were at the last place we were, we could use that. But it's in process at IANA-BIS. That's a working group. If you're interested, join that working group, but if you just want your IDR allocation, come see me and we'll work through it. Okay. Second thing that caused it—NRP to be delayed: we didn't carefully link the BGP-SR and BGP-SR-TE to SPRING concepts. Then when you get to the last call, it says "oops," it doesn't have the right support. Okay? So Seglist hit that and we're talking about it on the SPRING list. NRP took a whole interim to talk about it. That took nine months to get through, okay? So, to better coordinate, anyone who uses SPRING, I need you to put in a cross-working group section. This means when the reviewers look at it, they will say, "Oh look, that's the draft and the concept I'm trying to make a mechanism for in SPRING." If there's a link to the PCE, because another question that came up is, "Oh, can we set NRP via PCE? What does that—how does that interact?" And then if it's absolutely needed in an SRv6-ops—say you're China Mobile, China Unicom, some of the other big players who've done SRv6—put it in there. IDR has long had the attitude: if an operator needs it, it goes to the top of the list, because we're all about building things for operators.

Okay. So, if you want more about the problem, come to SPRING. I've got a long discussion. Other than that, IDR is busy. If you'd be ready right—I'm going to finish this within three minutes if I'm good. So stand up and come—come get ready because we'll pick it up. We have this morning draft-ietf-idr-sr-policy-metric. And draft-ietf-idr-sr-policy-metric has a few things in its early allocation, but we're not going to talk about that. We're going to have her go through the information about the draft. draft-ietf-idr-bgp-ls-bgp-only-fabric. Arvind has some things to say. Again, these are two adopted drafts. Then we have new core features from Changwang and Zhenqiang. Their core features—we're now starting to tell you if it's an SR Policy, an SR-TE, a FlowSpec, or a BGP-LS. So a core feature says it's not in that, it's just—that's how we're categorizing. Our AD has asked us to do that so we can see how our—how our drafts go, that we're progressing enough in each category. TE is next, BGP-LS is next, and then another core signaling one by Manakamana. He switched with someone on Monday, so we fit him in at the same time. At the MVPN FlowSpec and at the end, we'll have a presentation from a student on—on how he's looking at some BGP convergence. He's at the end. We're going to slip everything in, so things are tight.

Speaker: Slides? Do I have to take it back? Take it back first. Clicker to get it. Yeah. Click it again. Share slides. The share slides button. Yeah. We got two. This one, yes. Okay. And the clicker again.

Ka Zhang: Okay. Hello everyone. This is Ka from Huawei, and the—my topic is about the draft-ietf-idr-sr-policy-metric. And this draft is in the working group version 4. And we—we add a metric information to BGP SR Policy SAFI, and the metric of SR Policy can be used for next hop selection, and the metric sub-TLV and the segment list sub-TLV. And from last version, we have got some comments, and the first comment is from Ketan. It is about the metric type registry. And we—in this draft, we need a new metric type registry and BGP tunnel encapsulation. This code—this metric type is identical with the BGP-LS SR Policy metric types. So Ketan suggests that we reusing the existing registry for BGP SR Policy SAFI. And it is clear that the metric type defined in this draft is the same as the BGP-LS policy metric type, so we can reuse the code points. But the question is the name. The name for the BGP-LS SR Policy metric is—the name is BGP-LS SR Policy metric. So maybe we can rename the registry. And next comments is from Amanda for IANA review. And it is about the—one is about whether to use a new registry is required, and in this—in this case, we clarify that a new sub-TLV registry is required, and SR Policy segment list sub-TLV registry. And then we will update the reference to BGP tunnel encapsulation registry group. And also as in response to Ketan's comments, a new registry for metric type would not be needed. And then it—we got some comments from Zhengqiang. And the first comment is about the metric—metric granularity, and the segment list represents the active path's data path—data path, and the metric can vary from different data paths. So we propose that the metric is a per segment list attribute. And this has been aligned and agreed offline. And the second comment is about the to clarify the usage of metric—usage of the SR Policy metric in path selection. And the usage of the metric is illustrated by an example in this document, and we will further refine and elaborate on the usage of SR Policy metric in next version. And then is some comments from Susan, and it is also about the registry for SR Policy segment list sub-TLVs. And we will update this in next version. And also we will update some reference of the draft to RFC. And then we will add some—add some security description for the metric usage. And also we will fix the table format for code point 5 of metric type. And then next step, we will fix all the technical and editorial issues in next version, and then we will finish the code point early allocation, and then we will request a WG—working group last call. Thank you. And any comments?

Susan Hares: So Ka, the one thing we need to do fairly quickly is what—if you have time, we're here to go talk to Amanda together and make sure everything's working for your allocation. You've done some excellent thing, and then we can go back through a working group allocation call. Did you make the other corrections to the draft?

Ka Zhang: Uh, no. I—I will update it later in the next version. Yeah.

Susan Hares: Okay. So after we talk to Amanda today, and maybe after IDR, we can confirm that everything's fine with her, so that doesn't take any more delay. You make the corrections, and then we'll do a second working group allocation call, and then you can head toward working group last call. Yay!

Ka Zhang: Okay. Thank you. Thank you very much.

Susan Hares: Thank you. Okay. Please go ahead.

Zhengqiang Li: Yes. Zhengqiang Li from China Mobile. One more action for the next steps. As discussed and agreed with the authors from Huawei and China Mobile, this draft should be merged with my draft, and we all agree with it. So that's a—a—an important action for the next step.

Ka Zhang: Okay. We will discuss it later offline. Okay.

Susan Hares: Any other questions? Otherwise, we'll go right to the next speaker. Thank you.

Speaker: This one? Yep. It's Arvind. You should have the control.

Arvind Babu: Yeah, I have the control. Thanks, Sue. Hello, good morning all. I am Arvind Babu from Cisco Systems. Today I will be presenting the updates of BGP-LS for BGP-only networks on behalf of my co-authors. Yeah. Let me quickly explain what this draft is proposing. BGP-LS, originally defined in RFC 9552, was designed to export link-state and traffic engineering information to the northbound. Historically, BGP-LS has focused on distributing information learned from the IGPs. RFC 9086 expanded this by introducing BGP protocol for exporting EPE links. This draft defines the mechanism to distribute network topology and traffic information in the networks where BGP is the only routing protocol. So why is this required? Many data centers, MSDC fabrics, and emerging AI-DC networks run BGP as the only routing protocol. Unlike IGP networks, BGP-only deployments do not have a standard way to expose full underlay topology and traffic engineering information to controllers. Topology visibility is increasingly necessary for optimized fast convergence and resiliency solutions in data centers. Refer to ongoing discussions in RTGWG on the topics listed, like AI/ML-driven IP FRR and resiliency optimization, efficient route protection, IP fast reroute in BGP-only networks. So by exposing these topologies, we can enhance TE capabilities, ensuring efficient end-to-end traffic engineering between data centers over WAN. So how is this done? Each BGP speaker is enabled to advertise its own local topology into BGP-LS using the standard NLRI types: node, link, and prefixes as defined in 9552. The draft mostly reuses the existing BGP-LS descriptors as well as the attribute TLVs. What the draft adds is the procedure for how those NLRI TLVs are advertised in a BGP-only fabric context. It also introduced a new BGP route type TLV in the prefix descriptor so that the receiving system can correctly understand the type of BGP-derived prefix information being carried. Beyond that, the approach is intentionally conservative: it reuses existing BGP-LS attribute TLVs and does not introduce any new attribute TLVs at this stage.

So what remains the same? This draft is intentionally made non-disruptive: it does not change the base BGP underlay, the IPv4 and IPv6 routing behavior, or the normal BGP decision process. It also preserves existing peering models and operational design. The current text mainly focuses on the common EBGP single-hop model mentioned in the style of RFC 7938, but at the same time, it does not rule out any other peering approaches. And the BGP-LS sessions to controller simply ride on the reachability which is already provided by the underlay. So the key message is that this adds topology visibility without requiring operators to redesign how the fabric itself routes. So what—what are the different modes of deployment? The slide covers two different modes. On the left is the distributed mode, where each router shares topology information through the route reflector, and that information is distributed back across the network or to other routers. What we have on the right is a centralized mode where one or more BGP speakers or route reflectors, they consolidate these topology info and provide them to a controller for achieving end-to-end visibility and centralized computation. Meanwhile, the routers continue to run the underlay for IPv4, V6 exactly as they do today, using standard BGP decision process to build forwarding.

So what are the topology information advertised? We right now advertise node-level information like node name, BGP router ID, ASN; link-level information like link name, link IDs, addresses, bandwidth, SR adjacency SIDs; prefix information like SR prefix SIDs. But in general, other BGP TLVs can also be leveraged based on the use case. So then as next steps, the implementations are currently underway. Based on the experiences from the implementations, the draft will be updated. We'll also be requesting for code point allocations. Finally, we invite review and feedback from the working group to advance the proposed extensions. Thank you. That's it I had.

Susan Hares: So I'm going to ask this of our AD, Ketan. Ketan, is there anything special that we need to check before—on this draft before early allocation request?

Ketan Talaulikar: Uh, so speaking as a co-author and not as AD, as Arvind mentioned, we are still not at the point, but we'll be there soon asking for code point allocation. But this would be BGP-LS, so it would be the normal expert review process; it's not an early allocation process.

Susan Hares: Okay. Uh, so help me when we get to the allocation to remind you—me you're an author, so I find someone else if we need some help.

Ketan Talaulikar: Yeah, this—this is not—yeah, this does not need an early allocation process, this goes directly to expert review process.

Susan Hares: Okay, thank you. That's what I thought. Yeah, you can just go to the expert review. Okay. Thank you. Any other comments on this draft?

Keyur Patel: Hi, Keyur Patel, Arrcus, speaking as a working group member. Minimally, you should cross-reference this work with LSVR because the extension, while IDR handles LS extensions, this extension for LS is predominantly defined for data center use cases, it seems like, which is handled in LSVR. Thank you.

Arvind Babu: Thanks. Thanks, Keyur.

Susan Hares: Kiriti, you're next.

Kiriti Kompella: Hi, Kiriti. Um, I was trying to write this draft, I probably messed it up as Robert pointed out to me. So I'm happy to talk with the authors and understand what they're intending, but this is exactly what I wanted. The one thing I wanted to add to this is to add node capabilities, which I don't know if BGP-LS carries today. Because um, I want to understand which nodes are capable of doing certain things. I—so there's an IGP extension that will allow you to say, "This node can do XYZ." So we have a draft in LSR that adds node capabilities for MP-TE, multi-path TE. I want to carry that as well here. Other than—other than that, I think this draft does what I need.

Arvind Babu: Yeah, thanks, Kiriti. We can discuss and consider the comments and add further extensions. Thanks.

Kiriti Kompella: Okay, thank you.

Susan Hares: Fortunately, we have one of the co-chairs of LSR, so I think they've written it down. Anything else? Okay, thank you.

Speaker: Okay, just let me—I don't have it. Did you already set up? No, here we go. Click. Release the from Arvind. Yep. No, no. I—I've got to get the other slide. Do the share slides thing. Yeah, I've already put it in there. Share slides. Okay, share slides. Yep. Share slides. You put the next slides up? Yeah, yeah. I already put it in there. Okay, that's what I was asking. Uh, Interface Capability. Yep. Interface Capability. I got it, then. Nice. Okay, close this.

Changwang Lin: Okay. Hello everyone. This is Changwang Lin from H3C. I'm going to present this draft, our co-author from—from HPE. This draft is about the interface index capability for BGP. First, we can see the background. In the right picture, two different ASes established three parallel unnumbered peer links. These three parallel peers have same local address, peer address, local AS number, remote AS number, and these parallel peers cannot be distinguished by these identifications. So interface index need to be added for identify local unnumbered peer. And also the local interface and the remote interface need to be added for association the local unnumbered peer and the remote unnumbered peer relation. IGP has the mechanism to learn the remote interface index. For OSPF, the interface index TLV defined in OSPF Hello packet. For IS-IS, the extended circuit ID TLV defined in IS-IS Hello PDUs. So for IGP, can learn the remote interface index. And IGP can—can get the remote identifier before advertising this link in LSDB. But unfortunately, in BGP, BGP has no such mechanisms to get the remote unnumbered interface index. There are some use case. First use case for BGP-LS SPF. In the right finger, we—the two AS established three parallel unnumbered links. As described in RFC 19815, for unnumbered links, the link local and remote ID are used. It also recommend that the local local remote ID should be node, but this RFC doesn't define any mechanism to learn the remote interface index. For BGP-LS SPF calculation, when it run the SPF calculation, the two-way link—link checking should be processed. In this, we need use the local link to find the reverse link. If BGP does not have the remote interface index, the bidirectional checking will become a problem. Another use case is for BGP EPE. When BGP EPE report BGP topology, it contained link NLRI and the link NLRI's attribute. As described in RFC 1986, the link NLRI include the local node descriptor, remote node descriptor, and the link descriptor. The link descriptor must include the local identify and remote interface identify. For the same example, when the parallel unnumbered BGP peers report BGP link NLRI to the controller, if the link NLRI does not contain the remote interface ID, then the controller cannot map the local EPE link and the remote peer—peer link NLRI, so the controller cannot associate the local peer link and the remote peer link. The solution in our draft is same for—same to the—for the IGP mechanism. We learn the remote interface ID before establishing BGP peers. The routers exchanged local interface ID by open message. Then after BGP established BGP peers, the BGP can advertise the links with the local and remote interface index. Finally, we can see the protocol extension. We—our draft defined a new interface capability, interface index capability TLV in open message. When receive open message contained the—the remote interface index, then BGP can record the remote index. Then when BGP advertise BGP link information, it can contain the local and remote interface index. Yeah, that's all. Thanks for Sue for her reviews and suggestions. We will update the draft accordingly. Any question, comments are appreciated.

Kiriti Kompella: Hi, Kiriti. Um, since the theme is to do IGP in BGP, um, I think this is useful.

Changwang Lin: Thanks.

Susan Hares: Thank you, Kiriti. Cute. Um, is there any other comments online? AC, you are online, please go ahead.

AC Lindem: Hopefully you can hear me. Uh, this would—this would help for one of the other cases in BGP-LS SPF as well. If we had—I'm not saying—I haven't thought about this yet, I didn't read this draft before, but knowing it would help disambiguate and identify unnumbered links in BGP-LS SPF as well.

Susan Hares: Okay. I encourage people to think carefully about this draft. It asks—and we'll have the authors come back, but I—I really think you need to look at this one and evaluate: is it worth the risk? Okay? Kiriti, you may go again.

Kiriti Kompella: Thank you. Kiriti. So um, I think the high order bit here is, in data centers, people don't run IGP. And I want to know why. I mean, this has been sort of the Bible for so long. Um, so that's why we need this draft, that's why we need the previous draft, the draft I was trying to write, which I didn't do well, um, basically to essentially use BGP as an IGP. Um, I understand everything about dump trucks, but we're not actually changing what BGP is carrying, that's already done in BGP-LS, we're just changing the mechanisms. But really what we're doing is reinventing IGP in BGP because people don't want to use IGP in data centers. So if someone figures that out, we don't have to do all this, but we have to understand why this is happening and what the antipathy to IGP in data centers comes from, where that comes from.

Susan Hares: Go ahead, Tony.

Tony Li: Tony Li. I'd like to answer that question. Long ago, far away, in a galaxy not too far from here, um, someone built a data center and used OSPF. And they used a particular OSPF implementation which was not fully mature. And they had a bad time. They had a very bad day. And they started looking for other solutions. And a person at the vendor sat down in the lab and did a study and showed that BGP sent fewer messages than OSPF did, and therefore they should try BGP. And the data center operator looked at that and said, "Oh joy, oh wonder, we'll try it." And they liked it. And they liked it so much they published an RFC, and it's informational status, but it became gospel in the industry. And now when you go to a data center operator and ask them to run an IGP, you get back a scatological response. Um, so the industry is locked in on an IGP. Uh, both Tony P. and myself have tried to redirect them and give them better solutions, uh, but they are extremely resistant. This is an installed base problem, and they have all the money and they are very happy with their solution. They have no pain points, and mm, what we think doesn't matter.

Susan Hares: Thank you. Jeff, you're next.

Jeff Haas: Hi, Jeff Haas. So, wearing partially my chair hat here. We're seeing a number of extensions coming out in a number of different places to expose link endpoint information. So the draft here is, you know, effectively how do we get visibility about IF indexes? We've seen at least two proposals this week in different groups about ports. We've seen other places where this ties into other flavors of traffic engineering. So the thing I'm flagging here for the working group to pay attention to is, as we get to all these different ways of trying to expose, you know, the hidden inner details of how the links are actually attaching to devices, we should be careful to try to pick a consistent solution for that. That's it.

Susan Hares: Okay. We're going to have to cut the line at this point because we're behind. But thank you, Changwang. I encourage conversation on the list on this draft. Thank you. Please next draft. Zhenqiang. Release it first. Click it again. Again. Got it. No. No. Four. Yeah. Got it. Should be yours.

Zhenqiang Li: Yes. Good morning. My name is Zhengqiang Li from China Mobile. On behalf of the co-authors, I'd like to present this document, BGP Extensions for Network Resource Partition. It is a very simple draft, a new type of BGP extended communities is introduced in this document to—to carry the NRP ID information. So why do we want to do this? To decouple SR Policy from NRP to increase the scalability of NRPs. Existing approaches bind SR Policy to an NRP on a one-to-one basis. draft-ietf-idr-sr-policy-nrp extends the BGP SR Policy protocol to distribute the NRP ID and its associated SR Policy together when the SR Policy is delivered from an SDN controller to the headend node in the network, as shown in this picture. So, this—this solution is feasible and can be deployed within a limited domain with a limited number of NRPs. However, it lacks the flexibility and incurs significant operational overhead as the number of NRPs scales. Especially when multiple NRPs share the same SR Policy path. Please look at the picture below. Three services demand three NRPs to meet—to meet—to meet the low latency and bandwidth requirements. In this scenario, the controller has to set up three SR Policies, and these SR Policies share a common low latency path from A to F. And BFD or SBFD session is usually needed to—needed for each SR Policy to quickly detect the liveness of the policy, right?

So how to do it? Extend BGP rather than BGP SR Policy to advertise NRP ID information in routing updates between network nodes directly. This solution decouples SR Policy from NRP while coupling NRP with the services that demand the corresponding NRP for forwarding resources. Our key idea is that it is the service, not the SR Policy, that requires NRP. Our solution allows multiple NRPs to share a common SR Policy path, thereby avoiding linear growth of SR Policies. This eliminates heavy OpEx burdens related to SR Policies, removes duplicate SR Policies, and avoids the need for associated BFD or SBFD sessions. NRP ID is carried in—in a transitive opaque extended community. We call it NRP ID Extended Community attribute. The format—the format of this attribute is also very simple. A new type code is—is requested to be assigned by IANA, and currently there's only one field is defined in it, that is the NRP ID. The procedures: first is the prerequisites. An appropriate SR Policy has been set up, and the forwarding resources for NRPs are reserved. All nodes along the—the SR Policy path support the data plane encapsulations measures for NRP. How the prerequisites listed here are satisfied is out of scope of this document.

For BGP updates, according to service requirements, the endpoint node carries the NRP ID information in—in the NRP ID Extended Community attribute introduced in this document when advertising service rules to the headend node. Upon receiving BGP updates with NRP IDs, the headend node associates the NRP ID in the—in the routing update with the corresponding rules. And for traffic forwarding, the headend node, when it receives a packet matching the BGP rule corresponding to an SR Policy and an NRP, the headend node encapsulates the packet and forward it to the first segment of the corresponding SR Policy. Then the packet is forwarded with the network resources corresponding to the NRP ID. As the specific data plane encapsulation, we do not touch it in this document. And for the intermediate node, it forwards the packet along the SR Policy path based on the destination IP address, actually it is a SID in the SR Policy scenario. Then the packet is forwarded with the network resources corresponding to the NRP ID carried in the packet, whether it can be carried in—in the source IP address or the extension header. For the endpoint node, it decapsulates the packet and forward the inner packet accordingly. That's all for the procedures, and next step, further reviews and comments are greatly appreciated. That's all for my presentation.

Susan Hares: Zafar, you're next.

Zafar Ali: Hi, sorry. Thank you. If you go to your slide number 3, the thing is this: that the SR Policy color already showed the intent. Um, so this whole idea, the carry another ID to show the intent is—is not a good idea. In fact, the reason you're—you're having an issue on this slide, it was slide 3, um, is that the construct for the NRP in the draft-ietf is also very loose. Um, is it a control plane instance? Is it a data plane instance? Is it the intent? Um, if you go to slide 3, that... yeah. So if you only want low latency slides, you don't have to create three NRP IDs for that. That is—that is can be encoded one color, and that's what we do today. So I think this whole thing needs a lot of clarification and actually in draft-ietf as well, which there is a SPRING draft, um, there is an IDR draft. This is where the clarification needs to go, and once those clarifications are done properly, then you shouldn't have this issue that you've highlighted.

Susan Hares: Zafar, to clarify your comment, your comment I know not only on this, but the draft in SPRING and the current one that we passed working group last call from IDR?

Zafar Ali: Uh, yes, yes, because the intent, the—the NRP ID itself is—is used very loosely, what it is. And uh, then people think, okay, NRP ID is carrying low—low latency intent, is a control plane construct. You already have control plane construct in SR Policy to give differentiation or—or—or satisfy the intent. So there's a lot—there are quite a few of things that I don't think we need this draft because this could be done with color, but I think there is what needs to happen is this clarification in the IDR and in the SPRING draft. There were some comments I believe when these—this work was being done, but there is still some—some discussion.

Susan Hares: Okay. I will talk to SPRING today about this and mention what you said. Drop me a note in email, would you? Thank you.

Zhenqiang Li: Uh, a quick clarification.

Susan Hares: Any other comments on this draft? Okay, we're out of time. Thank you so much.

Speaker: Click. Release it from Arvind. Yep. No, no. I—I've got to get the other slide. Share slide. Click. Share slides. Button. Share slides. Yep. Got it. Should be yours.

Zhenqiang Li: Yes. Good morning everyone, I am [Zhenqiang Li] from China Mobile. And I'm going to give a presentation about BGP SR Policy extensions for BFD configuration. Let's begin with the motivation. In controller-based developments, SR Policies are typically distributed using BGP SR Policy or PCEP. However, in current developments, SR Policy provisioning and BFD or SBFD, BFD or SBFD configuration are performed independently, often through manual configuration, NETCONF, or additional automation mechanisms. This separation introduces several operational challenges. The first is increased operational complexity and the longer provisioning time and the potential inconsistencies between SR Policy and monitoring configuration. While PCEP extensions to carry S-BFD parameters for SR Policies are already available as working group document and this proposal focuses on BGP extensions. We update the first version, we received feedback from experts, and we updated the new version. And this version updated TLV encoding with BFD or SBFD parameters and clarified the relationship to RFC 9026. The RFC 9026 defines a new BGP path attribute to associate BFD sessions with BGP routes. However, to use RFC 9026 for the SR Policy use case, additional consideration would be needed. In particular, it may require enhancements to the SR Policy model to interpret and act upon BGP BFD state at the candidate path level. In addition, support for S-BFD would need to be concerned—concerned. This approach would likely require further discussion in the working group.

This document extends BGP SR Policy defined in RFC 9830 to distribute BFD or SBFD configuration parameters alongside SR Policy candidate paths. New optional sub-TLVs for BFD or SBFD monitoring are defined within the tunnel—tunnel encapsulation attribute. These sub-TLVs let the controller specify BFD or SBFD settings per SR Policy candidate path. And these parameters allow a controller to specify BFD or SBFD settings for a specific candidate path. As shown in the diagram is the workflow. Firstly, controller sends a BGP update message containing SR Policy NLRI with new BFD or SBFD sub-TLVs. And the headend router parses the message, extracts BFD or SBFD parameters and passes them to the SR Policy model. The SR Policy model stores the candidate path and instances the corresponding BFD or SBFD monitoring session. And let's look up the—let's look at the encoding details. We have defined two new sub-TLVs. The first is BFD parameters sub-TLV, which contains the fields defined in RFC 5880, and second is SBFD parameters sub-TLV, which—which contains the fields defined in RFC 7880. In both TLVs, we use flag to indicate what optional fields are present. What important point we should explain is the discriminator handling. In classic BFD, your discriminator can be learned during session establishment. So the field of BFD TLV is optional. And in SBFD, the reflector doesn't establish a session or perform distributor negotiation. Therefore, the initiator must know the reflector discriminator in advance, which is carried in the your discriminator field. That's all. Welcome questions and comments. Thanks.

Jeff Haas: Hi, this is Jeff Haas, and I'm wearing two hats for this conversation. One is IDR chair and the other is BFD chair. So as we discussed on the mailing list, your slide 5 where you're putting in basically most of the BFD packet itself is really not necessary. You know, the RFC 9026 work is just basically to say that if you need to have the discriminator or other information passed along to bootstrap the session, you pass the minimal amount of data. So the first piece of advice I'm repeating: you're just looking to pass the minimal amount of state, and that's discriminator and maybe some of the endpoint information since this is SR. The second piece of advice is that implementations that are passing this stuff in the control plane don't usually get to set up what the timers are for the BFD session. So while you may have as part of your use case to try to say that a given SR path is supposed to be protected by a specific set of timers, it is unlikely that implementations will respect those timers passed along in the protocol. So I'd recommend removing those. Thanks.

Susan Hares: Any other comments? Uh, you realize Jeff is the BFD chair, right? Yes. Okay, I would listen to him. Any other comments?

Zafar Ali: Yeah, I was just wondering this—this just become quite chatty. Um, carrying all the cross-dependencies parameters in—in the protocol. So I just wonder if like maybe some sort of a profile ID or something of that nature could be done then then device can do the mapping of that ID to whatever local configuration it needs, so that you don't end up doing this chatty business in the protocol itself. It by the way this is something that PCEP support this way of doing it.

Susan Hares: Okay. We've got about a minute for the last. Zhenqiang, if you go ahead, but no one after him. Please go ahead.

Zhenqiang Li: Yes. Zhengqiang Li from China Mobile, the co-author of this document. Just a—a brief clarification. This is the first presentation of this document. Our goal is to—to align BGP SR Policy protocol with PCEP. Both BGP SR Policy and PCEP can be used to provision SR Policies. PCEP has been extended to support S-BFD parameters configuration when setting up SR Policies. We believe a similar mechanism should be introduced in BGP SR Policy. As for how the measure—the measure should be implemented and the specific packet format, we can discuss them further and based on the community input we will choose and refine our document. Thank you.

Susan Hares: I recommend you have a chat with Jeff and Zafar and other people about your idea. Thank you again for your time. Just a moment folks, if there's any comment on the last one. Okay. Just take a look at the new slide. Thank you, G's got it up.

Yisong Liu: Okay, thank you. This is Yisong Liu from China Mobile and good morning everyone. I'll present the BGP extension for the SRv6 policy segment list optimization on behalf of my co-authors. The background of this proposal is that as we know, SRv6 policies use the list of the SIDs to steer traffic from a headend node. The final SID, the final SID in the list is often the policy endpoint's node SID. But traffic steered the over the traffic over the policy is already destined to the endpoint of the policy, and including the endpoint node SID in the packet SRH provide the no routing benefit. So this redundancy waste valuable head—header space and reduces the forwarding efficiency, especially when the service SID are present in the SRH. So the purpose of this draft, we want to define a method to optimize the SID list by excluding the endpoint's node SID (that means the last—last node's node SID, okay) for the date traffic, and maintain the ability to use the whole SID list, including the last—last node SID of the last node, okay, for the other purpose, like the OAM. So for the—we have a series of draft about this, so the SPRING working group has adopted the base draft about the SRv6 policy SID list optimization draft. So the draft I am presenting now is as its BGP SR Policy protocol extension here.

So for the extension details, we have defined a new flag, means that IFN flag, instead of the final node SID flag. The flag definition, we have aligned across the SPRING, PCEP, and BGP-LS drafts. So this flag is carried in the candidate path administrative flags sub-TLV. The this administrative flags have defined in the draft Lin. And the flag semantics is that when this IFN flag set to 1, so the endpoint node SID must be included. If it set to 0, and the endpoint node SID must not be included, that means the excluded for or for date traffic. For the operational considerations, we want—when the controller calculates the an—SRv6 policy and distribute it to the headend via the BGP SR Policy, and the controller sets the IFN flag, if set the IFN flag to 0, that means for the optimization, and the headend receive the policy and apply the logic during the packet encapsulation. So this is a one use case for how to use that flag. When we—for the example, we have a policy from the PE1 to PE2 and we distribute the candidate path CP1 and with the flag IFN flag, and the SID list here is 3::3, 4::4, 5::5. If we—if the date packet processing to the VPN, for the—for the date packet process, the headend will steer the packet into the SRv6 policy because the IFN flag sets to 0, and it will exclude the endpoint node SID like the 5::5 from the SID list. It will include the endpoint service SID in this example is the 5::100, and the resulting SRH SID list will be the 3::3, 4::4, 5::100. So for another procedure about the OAM packet like the BFD, the headend will use the full SID list from the policy, and the SRH SID list will be the 3::3, 4::4, and 5::5. Here is the fig on the right. So we co-authors think the draft is stable, this is the second presentation of this draft, the stable and we ask for the working group adoption for this draft. We want—also want more further review of this draft. Thank you.

Susan Hares: Any comments on this on the list? Okay, I will ask one question. Do we have enough experience from SRv6-ops to judge between optimizations? Do you know, Yisong? You're working in all of that. The question is deployment experience. Is it—is it driving this?

Yisong Liu: Yeah, yeah, because the—we always use the SRv6 policy use the explicit path, and but if we—if you use the loose path, and we may—maybe the last node SID will be on the policy. And if you use the implicit path, we have no last—last node's node SID, but we want to distinguish this.

Susan Hares: Okay. My suggestion for you is in my chair hat is to share that information and the link to the—anything that you can publish from SRv6 to let people know about the experience. That's it.

Yisong Liu: Okay, thank you. Thank you for suggestions.

Susan Hares: Thank you. Okay, happy Friday. And I'm Yao Liu from ZTE. So this draft is about the supplement of BGP-LS distribution for SR policies and state. Not very new draft. So background: BGP-LS can support SR Policy, the both the configuration state of the headend. But a few segment list related information is not included in BGP-LS. The first is whether the segment list is in administrative shut state, because we found that in the network sometimes the segment list may be shut by an administrator. And the second is the 32-bit MPLS LSE with non-zero TC and TTL fields. Existing SR Policy structure RFC already supports to carry the generic MPLS label information, the 20-bit label information, in the type A SR segment, but the TC and TTL must be set to zero. And the MPLS MNA stack solution repurposed the TC and TTL fields to carry additional information. MNA, such as NRP or OAM, can be inserted in the SID list in the format of LSEs, and the contexts of the LSEs inserted in the SID list may be required by the controller when the headend reports the state of SR policies via BGP-LS. And about the extensions: so first is one new flag in SR segment list TLV, so the S flag indicates that the segment list is in an administration shut state when set. So we remove the B flag, which is the backup SID list flag, based on the comments we received and to be aligned with SR Policy architecture. And the second extension is MNA sub-stack sub-TLV in the SR segment list TLV to carry the MNA sub-stack information inserted in the SID list. So this draft is first submitted in 2023 and version 3 is presented last IETF with the newly added MPLS LSE sub-TLV for MNA. And last month we presented the version 5 in IDR interim meeting. In version 5 we removed the backup flag as I explained in the previous slides. And the comments we received: first is what's the relationship between the in-stack MNA and SR Policy architecture. So it's—there's an ongoing discussion in SPRING list, so welcome to comment and discuss with us. And the second is how to carry the MNA information in BGP-LS using a dedicated sub-TLV or new segment type. So now we use the in this version we use the dedicated sub-TLV. And the third about backward compatibility considerations. So this version updates are the following: so first is for the the MNA sub—the name is changed, so the MNA sub-stack sub-TLV replaces the MPLS LSE sub-TLV extension. So considering that there seems no other use cases that need to report the MPLS LSEs with non-zero TC and TTL inserted in the SID list via BGP-LS besides MNA. And we added backward compatibility solution: so if you don't recognize the new sub-TLV, then you skip it and process the remaining part. So that's all, and we also presented this in Monday's MPLS session for review and comments. Yeah, that's all.

Susan Hares: Zafar has a question for you. So go ahead, Zafar.

Zafar Ali: Yes, I was asking on the chat as well the same question. Um, are there implementation that and my question is only for the S flag under SL. So are there implementation that actually do shut, allow shut off an SL within a CP and why is this different than just removing the SL from the list of SL in the CP? The BGP-LS today only support CP level shut construct. So I'd like to see if if there are actually people that implementation if there are no implementation then maybe we don't have to define that S flag in the SL. And and the rest of the thing I I have no comment for rest of the thing. Thank you.

Susan Hares: Zafar, to clarify your question, your question is that why there's a need to shut a segment list, right?

Zafar Ali: So so in in some networks, so it's totally depend on the network operators. So sometimes they found that the bad ones so is is enough, so they sometimes shut some segment list, so they can save energy. So this is a first scenario. And sometimes they are do some replanning of the traffic, so they don't want don't want certain traffic go through this so don't want this SID list, so they first shut it and to see if it's not used. Then watch it for a while, so is the network all running correctly? So then they do some maybe they delete or do some operation.

Susan Hares: So just a moment, he had a very simple yes-no question. His question was: do you have implementations of your draft?

Yao Liu: Yes, so about the S flag, yes. The MNA, so for my understanding there's no commercial deployment of MNA.

Susan Hares: There's no commercial implementations of this draft but you have implemented it in your lab? Is that what I understand?

Yao Liu: So about the S flag, yes. So but MNA part, no implementation. But I think this is a topic needs worth to discussion about.

Susan Hares: Okay. That's all. That was his—he had a very simple question and I'm sure he appreciated the other answer but he also wanted a question. Okay, thank you very much for your time. Hi everyone, I'm Jinzhao from China Unicom. How is the sound? Can people hear her or not? You need to stand a little closer. Okay. Hold the mic in your hand. It will—you need to be very close. Speak very boldly and loudly. Today, I'm on behalf of my team to introduce the draft, draft-ietf-idr-5g-edge-service-metadata. First, I will introduce the comment from mail list. The main comment is using BGP-LS to distribute subnet AS attachment information generally may face scalability issues, as self-enforcement destination per interface can be very large. But in current, this draft considers the extreme case of AS custom cone, as defined in RFC 8704 Section 3.6.1, small-middle AS custom cone up to 10,000 prefixes, large AS custom cone about 30,000 prefixes. And next, I will introduce the updates after IETF 124. First is the purpose of what BGP-LS used for this draft. Instead of collecting SAVs from data plane of an individual device, it collects all SAV-related information for the entire subnet or AS to which the router is connected. This information enables SAV monitoring traffic traceback and service anomaly analysis, for which dynamic real-time acquisition is critical. And the next is how BGP-LS is used. Routers that support SAV mechanisms or protocols establish BGP-LS sessions with the controller to report their multi-sourced SAV-related information. And the next, I will introduce what SAV-related information is to be carried. It might including the source node interface, source prefix, validation mode. And the next is the BGP-LS attribute for SAV model. The first is SAV mode TLV, an optional and non-transitive BGP attribute that carry that validation model information. It is not restricted to the four existing validation mode, in order to support forwarding compatibility and future extensionability. And the next is SAV action TLV. The SAV action TLV use the traffic filtering action defined in RFC 8955 and RFC 8956. An SAV rule might be associated with multi-source actions, and conflicts may exist among these actions. And more comment and discussion welcome. Thank you. Any questions on list? Thank you for your excellent presentation.

Manakamana Mishra: Good morning. This is Manakamana Mishra from Cisco Systems. I hope this is the latest slide, but it's okay. Year ago we did present one of the draft which talks about the problem statement in IDR and during while having the discussion about problem statement we came up with bunch of ways to solve the problem statement. So today I'm going to present one of the way which we decided to go ahead with and the decision was based on some of the discussion in the list, some of the offline discussion and this draft was presented in BESS as well. So just to recap what—what was the problem. In general, there are different ways of label allocation mode in BGP L3VPN network and it could be per-prefix, per-VRF, or per-next hop received label. Depending on the network, depending on the use cases, maybe one among them will be used. But right now our interest of area is per-next hop received label.

So what happens in—in this case? So let's say this network where I have route reflector which is a border router, so we will have next-hop-self also configured. In this case when any label is received in base case from PE1, so both of these route reflector which is border router, they are going to allocate a label based on the next hop plus received label. So in—so if you look at this picture, we have RR1 which is allocating label as a—it received label as a 100 and it allocated label as a 200 and advertise towards remote side of PE2. At the same time the peer RR2 or border router 2, and both of the border router are going to do exact same thing. When they receive each other's update, what they see is the new next hop and new label, so they are going to allocate the label once again. And this label allocation will keep happening until we really run out of the label, and that was the problem statement which we discussed earlier. And two of the solution we thought of, or rather which had the most vote, was going with NHC attribute or going with extended community. But after having some of the discussions, the decision was that it is better to go with NHC attribute because it—it provides more way to extend in future if there is a need, plus there are all the rules which is anyway getting defined in the draft which is mentioned here.

So in case of NHC attribute, I think one slide somehow it's missing which had the picture, we are going to allocate a primary label and it will be advertised as a part of NLRI, plus one or more alternative labels will be attached along with NHC attribute. And in the signaling mechanism, we have NLRI then we have NHC attribute, it goes little left side this somehow it's messed up. So as a part of NLRI, we are going to send the primary label and your attribute will carry one or more label. So there were some question during BESS session that do you really need multiple label? And I would say that is more for discussion with—within working group that should we restrict only one label or let it be open for multiple—multiple label where different use cases may want to have more flexibility. And for the forwarding model, ingress may choose between primary label and alternative labels and the selection could be depending on the local policy, what type of traffic, if you want any type of load balancing or any fast reroute. So this will be mostly depending on what operator wants or how operator wants this to behave.

And some of the example use cases are fast reroute or for backup path, traffic engineering, and if you want different label for different path or any service differentiation or any label map to QoS policy. The benefit which we are going to get is now we can have per-prefix different behavior based on the use case or what we want to use or what we want to achieve in our network. And then it definitely avoids these duplicate routes and it does solve that infinite looping problem and in fact that was after BESS session in and last year as well, there were some vendors plus some of the customers who did raise that, "Yeah, this problem is genuine and it does exist in the network." So the next step will be please provide any comments if you have. Right there was a question about whether this draft belongs to BESS or IDR. I would say that is more of a logistics question. I would rather first focus on the solution which working group will agree on and then we can decide whether to move forward in IDR or BESS. That is—that is the small thing. Thank you.

Susan Hares: Any comments on the—remotely or in here? Or we having the Friday as we've got in the chat room, the Friday feel of lack of sleep. Okay, thank you. Hello everyone, this is Jiming from China Mobile and on behalf of my co-authors, I would like to introduce a new draft called draft-ietf-idr-fsv2-ip-basic. Uh, this draft focus on the filtering mechanism that combine FlowSpec with the with destination QP aiming to provide a more refined solution for AI network management and control. So first let me share the motivation. In—in current network architecture, the rapid development of AI services has put forward a high requirements for network transmission both within and between data centers. So specifically, there are large number of elephant flows based on RDMA protocol, RoCE v2 protocol, between GPUs, and these elephant flows are characterized by long life, high throughput, and extreme sensitive to latency and package loss. And meanwhile, in AI cluster, a single task usually lasts for several days, so with a highly stable communication pattern and relatively stable 5-tuple fields. And the elephant flow always bonded to a long-term QP instance. So this caused some problems. First is link hotspot and congestion caused by ECMP hash collision. Due to the fixed 5-tuple fields and the presence of AI elephant flows, multiple similar large flows are hashed onto the same physical link, instantly triggering congestion. And second is lack of dynamic adaptability. Traffic is bonded to QP instance, even if one congestion is detected in on a specific link, traditionally ECMP cannot dynamic migrate existing long-lived flows to ideal link. To address these issues, the industry has attempted to introduce finer grand load balancing mechanism such as flowlet and packet level load balancing. However, this approaches introduce complexity related to package reordering and a higher requirements for network chips. In addition to split the large flow—splitting large flow, common solution include optimizing hash algorithm or relying on the terminal network cards to perform a more—to perform randomization, but all these methods have some limitation. So this draft propose a more simple and direct solution to ext—by extending BGP FlowSpec with QP to implement a QP-based traffic engineering. And and as the QP ID is a unit—the unique—the basic unit of RDMA communication, each each connection and each connection has a unique QP ID, even if multiple flows share similar IP addresses or end ports, their QP ID will always be different. So leveraging this property, we can we can not only split the large flow into fine-grand flows by using QP, but also we can improve the performance of ECMP scheduling by combined dimensional 5-tuple plus QP. And we also can reduce the dependency on on randomization performed by the terminal network cards. So in this draft, we defined a new component type for destination QP from non-IP filters. And then this component is four-byte long and directly corresponding to the QP number in RoCE v2 packet header. So in the whole solution, we also had a controller. The controller is responsible for processing the global network topology and computing the the the optimal forwarding path and generating BGP FlowSpec rules that include destination QP, and the ingress PE is responsible for parsing packages, matching QP, and forwarding matched flows to the des—designed path. So—so this—this design is safe for old devices because we the because the draft follows the standard processing mechanism for unknown types. So the mechanism enable direct coordination between RDMA layer information and network layer forwarding, equipping the network with explicit awareness and scheduling capability for AI traffic flows. So the the draft has—the solution has two advantages. The first is finer grand traffic steering. Network can achieving finer grand traffic steering based on QP identifiers, enabling superior load balancing across ECMP groups and enhanced isolation between communication streams. And second is from random distribution to control scheduling. Instead of only depending on random sending and end devices, this method gives gives network operator more precise traffic controls. And next step, we are seeking broader review and feedback from WG and we're planning to request a new type code point assignment from IANA. So I think that is all. Thank you.

Jeff Tantsura: Jeff Tantsura, NVIDIA. I don't think it's needed. There are zero benefits of doing this centrally. DQP is at fixed offset in BTH. Every modern chipset knows how to read it, how to extract and parse the data. It's a six-tuple and five-tuple hashing. All you need, there's zero value in centrally orchestrating it. You just look at QPID, you add it to the hash, gives you arguably better load balancing. There are a number of white papers from Meta, I believe, they've used this technology, the end result is not significantly better than not using it, but practically it makes it much more complicated and in my personal opinion, unnecessary.

Jeff Haas: This is Jeff Haas. A question for you: is this a five-tuple plus QP steered that you're trying to do? How is the traffic steered, you know, by the entirety of, you know, the IPv4 or IPv6 header plus the encapsulated payload? You're trying to use FlowSpec to steer the traffic. This means you must match packets. So this solution has two steps: step 1 is through BGP FlowSpec to steer the traffic to the SRv6 policy, and step 2 is using 5-tuple plus QP to hash the hash the multiple flows to different SL.

Jeff Haas: So agreeing with Jeff: if you don't need to actually steer things, you know, based on the tuples, a match mechanism inside of FlowSpec itself is probably not helpful. The second piece of feedback is that FlowSpec is not a fast mechanism. You're programming a firewall, and the firewall, you know, will take time to converge. So part of your conversation when you're looking at this mechanism is how fast you need, you know, this programming to change. That's it.

Speaker: All set, Jeff? I'm—I'm don't want to let—I'm holding Asian off while to make sure you're covered. Go ahead, Asian.

Asian: Yeah, Asian from China Mobile. My comment is similar with the comment from Jeff. You know, normally the BGP flow spec just filter based on the information from the IP headers and they have the fixed positions. But for the QP information, I think they belong to the payload, and the device need to find the value, where is the QP, and then match the policy. It may be challenging for the device and future deployments.

Susan Hares: Any other comments? Otherwise, we're over time. Thank you, that was an interesting presentation.

Yong Huang: Hello everyone, I'm Yong Huang from Huawei, and today I want to share the topic about the draft-ietf-idr-flowspec-redirect-ip. As we know, BGP FlowSpec is about to specify the distribution of the traffic field policy via BGP. It contains two parts: the match condition and the action. Normally we use the BGP FlowSpec use the traditional components like components in the IP packets to do the matching filter. But in some scenario in practice, for example, we need to do the similar thing with different destination, but the similar rules like this. If the traffic is use—use the routing path, for example, in the picture, we want to go through the router R1 and go through the R2 to the AS3. But if we want to go through router R3 for some purpose, we need to use the FlowSpec to do a redirection function to do that. But if the destination prefix is for several—several destination, we will occupied some lots of ACL entries. As we know, ACL entries is limited on the device. So we we want to make the FlowSpec more efficiency. So with the similar action rules, we can aggregate—we can use the other—other function like the destination community for the similar rules. Like in this scenario, we can we can use BGP routers carry a BGP community 1:1 like this for the four prefixes. And we can only— now we only need one rules for this—this scenario. It saving—saving the ACL resources. So we need a destination IP community filter. The filter format is similar with other filters, IP filters, while the value is contains a destination IP community.

After we present this in IETF 119, there are some concerns about the use of control plane attributes. But we try to reply for this. In our scenario, there is no AS or community information in the forwarding plane. They are used only in the control plane and then converted to one identifier which represented a kind of traffic classifier. Another is search a FIB information for more matching information of the flow rule. This is also used by other traffic process technology like QPPB or URPF. Or we can consider it as a extended IP matching filter for BGP FlowSpec compared to the basic matching components contained in the IP packets. Or so we want to make the flow rule more efficient to use. Or it would be the limitation for us to use BGP FlowSpec rules. So why this for BGP FlowSpec? Again, we want to saving ACL resources and the number of flow spec rules when we use the BGP FlowSpec. And I think it is a quick implementation by the widely used BGP FlowSpec technology. And we can use the basic BGP unicast route attributes like AS or community and it make efficient dynamic flow rules distribution for northbound applications. We have DIP origin AS filter draft submitted in 2019 and other—other filter draft is introduced later and we present the draft in IETF 990. And welcome feedback and comments to improve the draft. Thank you.

Susan Hares: Any comments on this? I know we're heading into FlowSpec and some of the things that are hopeful for that, but there's a lot of work to do. The authors in each of these need a grounding in: what's the real use case? What's the flow inside of IDR—meaning the flow for FlowSpec goes outward in a wide distribution, but getting information back to check on it is a different problem. So please give the authors some comments and go forward. Asian, if you want to talk, you've got the next one.

Asian: Yeah, Asian from China Mobile. I think the author has explained clearly the the benefit for for using the AS information to replace the traditional prefix filter rules. This can significantly reduce the admission overhead and also the the number of the matching rules. Although we call it BGP FlowSpec, I think the the rule that can be applied to the packet flow can be classified into the BGP FlowSpec, not we should not limit it into only the rule that related to the field in the packet. And another reason I think we propose such solution because the router has already the information between the prefix and the the AS and the IP prefix. So the controller need only download the AS information to the router and router can do other works for the for the flow engineering. Okay.

Susan Hares: Thank you, Asian. It is a problem that we will probably see a lot of these filters and so thinking about this particular way to combine filters, think about more. There seems to be in some of the discussions in other working groups a need for some filters. Thank you, Yang. We have one last presentation. It's a student giving us some of his concerns and interests. Give me a moment. Now I need to explain that he presented his draft on the list and several people kindly gave him feedback about the mechanism, so I asked him to go back as part of his research and give us the problem statement, because sometimes what we need to hear is the problem statement. So this is his first presentation at an IDR meeting. Be gracious.

Yuhang Li: Hello everyone, my name is Yuhang Li, I'm from Tsinghua University. I'm glad to make a problem statement about a specific BGP convergence problem and request for the working group feedback. I will first focus on slides 5 to 9. It is a very specific case. A route is withdrawn correctly, the receiving AS removes it and runs best path selection again. However, some other candidate paths may still appear usable at that moment, but those candidates can later also be invalidated as more updates arrive. So the network does converge, but it takes several rounds of exploration. This is not a stuck route problem, and this is not a zombie route problem. Signaling is valid, the problem is delayed elimination of invalid alternatives. It is a normal but expensive convergence process after a valid withdraw.

This is a simple example. AS1 announces a prefix P/23 to AS3, and it also announces a more specific prefix P/24 to AS2. When AS1 withdraws prefix P/24 at AS3, convergence may not achieved at one step. AS3 may explore paths firstly 2-1, and then 5-2-1, and then 4-5-2-1. Only after that does it converge. So the point is very simple: after a valid withdraw, a downstream AS may explore paths that look available for a while but will later fall. So we scope the problem narrowly. The question is: how can we reduce unnecessary path exploration after a valid root withdraw? We are not talking about fault session liveness detection, permanently stuck or zombie roots, policy-caused route rejection, or generally security filtering behavior. We only focus on the case where withdraw is valid, routing state is updated normally, but convergence still takes multiple rounds.

The limited scope leads to limited goal. We want to characterize this behavior, and we want to understand when alternative paths are explored unnecessarily. We want to know whether earlier invalidation knowledge could reduce convergence cost. We think about that without changing the core semantics of BGP withdraws. To be clear, I'm not asking working group to evaluate a protocol design today. I'm asking whether this scoped convergence problem is real, important, and worth further study in IDR, and what operational scenarios should ground it.

Okay, this is design question. The question is: after a valid withdraw, can downstream AS avoid exploring alternative paths that are already doomed to fall? More specifically, can invalid alternatives be recognized earlier? Can convergence reach the final state with fewer exploration rounds? Can this be achieved with operationally realistic mechanisms? At this stage, this is design question rather than proposing a fixed mechanism today. Okay, thank you very much and waiting for feedbacks and comments.

Susan Hares: So like many of you, I have read lots of BGP convergence documents over the years. I simply ask if he's got a good research problem that you think is useful in today's environment for the either the data centers or the edge or something else, that you give him some feedback. If he's really going down something that isn't useful, also give him that feedback. These are—we're hopeful that new minds can think of new solutions if some of us fail, but they need to be guided into the problems that matter. And with that, we are done. You have one minute extra. So, thanks to everybody for making us be on time. Thank you again.

Ketan Talaulikar: Hi, I think AC was in the queue. I was just saying AC is in the queue.

Susan Hares: Go ahead, Ketan, you're always—you always get top priority.

Ketan Talaulikar: No, no. AC was in the queue.

Susan Hares: Thank you. Okay, AC.

AC Lindem: Can you hear me? AC Lindum, Arrcus. Yes, we can hear you. Go ahead. Okay. Yeah, we recognized this problem for link failures. It's pretty hard for node failures and we have a—we did it for BGP SPF. And what we do is we advertise rather than withdrawing the link, the originator of the NLRI for the link advertises it down and it's held for a little while. If you look at RFC 9815 Section 6.5.1. So we did recognize this exact problem that you had the picture of for BGP SPF.

Susan Hares: That was useful information. It's on the chat. He's asking you to look at RFC 9815 Section 5.1.0. The chat will be available. Asian, did you have something?

Asian: Yeah, I have discussed this idea with the author offline and I think the reason that the AC has such experience maybe their draft didn't write very clearly. It is not about the advertise the link information within BGP, they want to—they want to solve some problem some within the some complex scenario for inter-AS links, not the link within AS. I think they maybe they should update their draft and later for the seek for more—more discussion in the list.

Susan Hares: Thank you, take his advice. And again, thank you for your patience in going through this and we'll close the working group session at that time. Thank you very much.