Session Date/Time: 19 Mar 2026 08:30
This is a complete verbatim transcript of the IETF 125 BIER session.
Presentation: 0-bier-wg-status (Slides: https://datatracker.ietf.org/meeting/125/materials/slides-125-bier-0-bier-wg-status-00)
Sandy Zhang: So, it’s time to start the meeting. Welcome to attend IETF 125 BIER session. Please note that this session has been recorded. So, our chair, Tony, is attending the meeting remotely. So, Jeffrey helps me to host the meeting in the room. And we’d like to introduce some thing about IETF. Please see the note well, especially you are freshman to the IETF. And this is the meeting tips. Please note that you can use the onsite tool to join the queue and to speak your comments or questions. And this is the materials of this meeting and you can find that. This is the link of BIER minutes and anyone who comments and has questions can help the BIER minutes here. So, this is our agenda today. If somebody have some opinions, please tell. So this is our working group documents status here. Now we are two documents in RFC editor queue and they will be published soon. And also we have a draft in IESG, draft-ietf-bier-ping. Yeah. And this is some drafts ready for next step and last call, directory’s review. And to the author of non-MPLS extensions, please refresh your draft because we can move it on. And the other one is draft-ietf-bier-ospfv3-extensions, it's in last call in LSR working group. Yeah. Gunter, do you have any some...
Gunter Van de Velde: Yes, hello, Gunter Van de Velde, Routing AD. Can you go back one slide please? Yes so, the draft-ietf-bier-ping. So good news, you know, all the discusses actually have been resolved so the thing can progress in theory. Now, one main observation was there, which probably would require, you know, some involvement from the working group itself. And it has to do so, like Ketan actually observed that the document is mainly about, like MPLS as a data plane, you know, from the ping perspective. And one of the open questions is, you know, maybe there should be like a probe to check with the working group if this document actually is intended to, you know, to focus upon all data planes or only the MPLS one at this point in time. So if possible, you know, I think it would be interesting to do like a quick probe on this draft before, you know, moving it forward with the working group to see if, you know, if this thing should be mainly for MPLS and that maybe at a later point in time, you know, there should be like another extension to this work for the other, for the non-MPLS data planes.
Tony Przygienda: Makes sense. I’m slightly baffled what is specific to MPLS in the ping draft? That was just, you know, this stuff should hold over any data plane.
Gunter Van de Velde: So I would recommend, you know, read the thread, you know, from Greg and I think the last email from Ketan actually on this topic. So this actually goes about this topic. So there are some certain things. So it is more as, you know, as an open question, you know. Basically, like I said, the thing is approved, huh, it can move forward, but this is more like a probe, you know, what the working group actually thinks, you know, is it complete for everything, for all data planes, or is it just for MPLS.
Tony Przygienda: Yeah, I mean general observation and I’m trying to do a little bit of flogging behind the scenes, we are blocked on these non-MPLS draft that should have moved forward. Tons of stuff is blocked on these things like draft-ietf-bier-ospfv3-extensions and so on. All right, so I’ll have a look at the beer ping and show some reaction. Thanks. Mm-hm.
Hooman Bidgoli: Yes, sorry, Hooman Bidgoli from Nokia. So there are two other drafts that I don't see here. One is the LDP signaling over BIER, which I think we did last call on it in excess of ten times.
Sandy Zhang: I've sent an email to you several years, several days ago, not years, sorry, several days ago but you haven't answered me. Yeah. I asked you about the PIM signaling and MLDP signaling. Yeah. Do you have some suggestion for the movement of these two draft?
Hooman Bidgoli: I thought we did make a decision that they are gone through the last call, there was a shepherd assigned to them and everything. So it's been talking about that for like past three IETF that let's shoot it through the whatever. Last call is done, everything is done, so I'm not sure what other discussions we want to have.
Sandy Zhang: The first step, refresh your draft.
Hooman Bidgoli: Okay. I did that last night and then here I am again. I did it in 2025, I refreshed it. And here it is expired again. So yeah, I maybe it's so maybe it's question to the chairs like I'll refresh it again, I don't have a problem with that, but is it going to go through or are we going to have this meeting conversation a year from now again?
Sandy Zhang: Tony, do you have anything?
Tony Przygienda: We should not. We actually, I combed fine-toothed through all the drafts that was outstanding and Sandy fired off stuff. There were IPRs missing, those all kind of stuff. This one I don’t recall, you know, or it slipped my mind. But Sandy sent out OSPFv3 extensions, BGP-LS BIER, no actually, looks to me like nothing came forward with that. Okay, on my to-do list. Yeah, thanks Hooman.
Hooman Bidgoli: Okay, I'll refresh it. Thank you very much.
Sandy Zhang: Yeah, thanks. Yes. And this is some drafts which has been started IPR call or the second IPR call, please respond in the mailing list, to authors and contributors. Yeah. So that's all for this meeting, oh, another one, we have the BIER use cases because something new need to be attend added in the use cases draft, so we will waiting for the authors to update it. Yeah. So that's all for this meeting. Yeah. Gunter, do you have anything else want to say?
Gunter Van de Velde: But so, Gunter again, you know, Routing AD. One quick thing, so what we have noticed in, you know, in the last couple months is that the current IESG is very strict on charters and documents in the charter and they have to apply with the charter and things like that. So I've had my, I've had my churn in those, you know, in those discussions right now. So, you know, basically so my request is, you know, to the chair and to the working group also, if we adopt documents or if we push documents towards, you know, towards IESG, please make sure, you know, that it is very clearly understood that, you know, why actually they actually are complied with the charter each, you know, at this point in time. If it is not, you know, not a big problem, we can just update the charter if necessary.
Tony Przygienda: Point taken. Thanks Gunter.
Sandy Zhang: Okay. So let's start the first presentation. Please wait for me. Hooman, you are the first one. Yeah.
Presentation: 1-EANTC-2026 (Slides: https://datatracker.ietf.org/meeting/125/materials/slides-125-bier-1-eantc-2026-00)
Hooman Bidgoli: Yeah, so we had some good news with regard to BIER interop. We did go to the EANTC again this year to make sure that, you know, end-to-end BIER interop is working. So we started just FYI, I did this in PIM, so it might be a little bit repetitive for the people's that were in PIM, but yeah, bear with me. So 2024 we started with BIER at I guess just BFR functionality, BIER forwarding functionality, and that was great between all vendors, Huawei, Juniper, and Nokia. And then in 2025 we decided to go one step farther, one layer upper, to the Next Generation MVPN and try to test the Next Generation MVPN to make sure that that is compatibility between all the vendors. We only tested it with I-PMSI, maybe next year we can take it to the S-PMSI. But the good news is that going forward we see interop between all vendors. There is a little bit of gluing here and Nokia is doing the gluing, but here's the story. So in 2025 what we did is we tried to bring up the NG-MVPN with underlay ISIS and overlay BGP. It was the IPv4 network. As some of you know, when you're using NG-MVPN, you need a regular unicast VPRN to resolve the source, what is the next hop for the source. So for that regular unicast VPRN, we used LDP in all cases. The reason for that, which was with SR-ISIS, we had a little bit of hiccup, so again maybe next year we can try ISIS SR-ISIS for the unicast as well. But for now, that unicast is using LDP tunnels. So in 2025, we did the interop Huawei and Nokia, they were interopting, there was no interop with Juniper. The reason for that was because in this slide kind of explains it. Juniper was using global, well downstream assigned VC label or global context now we call it, and Huawei and Nokia were using upstream assigned VC label. So as you folks know, that downstream assigned global or context specific VC label comes into the BIER Next Protocol and the data path is very strict about it. So if this BIER Next Protocol is not constant, is not set correctly based on the label, the VC label type, then the data path starts dropping the traffic. And as Juniper was using BIER Next Protocol 1 for the globally assigned VC label, they were sending us the traffic, we were dropping it, Huawei was dropping it. But between Huawei and Nokia there was interop because we were both using the BIER Next Protocol 2 which is upstream assigned or context specific VC label. So in 2026, obviously we wanted to make sure that, you know, HPE, now, and Nokia they have a story as well. So we brought up the network again, same story, ISIS underlay, BGP overlay for BIER I-PMSI. And this time what Nokia did is that even though Nokia is still assigning the VC label as upstream assigned, so context specific, Nokia introduced a knob that sets that BIER Protocol to be 1. So we send BIER Next Protocol as 1 and we receive it as 1. Not completely in par with the RFC but for the greater cause of BIER we decided to put that knob in there. And what we saw was complete interop between HPE and Nokia now. Traffic was going back and forth, no problem. The transit routers were doing correctly, so everything was shiny and everything the sun was shining upon us in EANTC 2026. What happened is frozen? Go back, forward, no? Oh, there you go. Okay, so here's the same table for 2026. I did not add Huawei here because they were not part of it. One thing that you can see here is all the vendors, Huawei, Nokia, and HPE, they are using this DCB Domain-Wide Common Block. Again, I think we are not, at least Nokia, I'm not going to talk about everybody else, is not 100% in par with the RFC because in the RFC it says if you are using DCB you need to signal it into the MP-BGP with a flag. And you know, we agreed with with HPE that we just going to use a DCB mode without signaling it. So that's one thing, just FYI. But yeah, I mean Nokia was using context specific VC label, Juniper was using global specific VC label, Protocol 1 for HPE, Protocol 2 for Nokia, and with that knob, when we enabled it saying that Nokia accept BIER Protocol 1 and send BIER Protocol 1, everybody was happy and data path was forwarding. Some stuff... so yeah, going forward, I think it's very important to put multicast on the roadmap for EANTC. So next year we're going to try to bring BIER with S-PMSI. But yeah, going forward, I think, you know, for IPv6 solutions and everything as a multicast group we should have more traction EANTC going forward. It's a little bit quiet on the multicast side so we want to definitely bring that into the into the light. That's all I have. Questions, comments?
Tony Przygienda: Tony quickly chime in. I would be leery of unifying the upstream and the downstream because then we’d kind of to get backwards compatibility we’d need to introduce a new protocol saying either, okay, and then you have all these possible combinations you have to support when you configure and deploy. And the second one, I can’t point to anything specific but we may end up having specific procedures for upstream and downstream and we have still to differentiate which type of label is that in the future. So that would be my observation. Thanks.
Hooman Bidgoli: Yeah, so I mean again, all I can say here is the experience that I had for the past three years in EANTC and trying to interop BIER with different vendors. My experience is very simple. That Next Protocol upstream or downstream, context or global, whatever you want to call it, is creating more issues than is solving anything. But I mean again, as I said, I leave it to the working group. All I can give you is the feedback, and my feedback is that we’ve been three years fighting this 32-bit field or whatever whatever it is, is it two-byte, one-byte, whatever it is. And the data path is extremely picky on that field. So as you can see, it’s becoming ugly, right? I mean Nokia, I can interop, Nokia can interop with anybody, with Huawei, with Juniper, but right now HPE doesn't interop with Huawei because of that and based on the testing that we've done and proof of concept that we have done, literally everybody can do two or everybody can do one, it’s not doing anything extra. That’s my feedback but I mean I leave it to the working group.
Jeffrey Zhang: I think last time when we were presenting the draft, I think we cleared up that you should be fine on on that side with the new interpretation of what we’re doing and you just don’t need the differentiation in your implementation but if somebody tries to do it in terms of distinguishing whether you’re going to do a lookup in a context table or a global table, right? So because you’re not doing that in your implementation, you don't see a benefit in the difference in the label, right? In upstream versus downstream. Well, not upstream-downstream, right, but global versus context label lookup, right? So but if somebody does do that choice, there is, you know, greater scalability coming out of that more work to implement and then it’s needed, right? So that’s why I think the conclusion was it doesn't hurt you in your implementation and it can help somebody who does another implementation. Jeffrey, am I kind of summarizing that halfway correctly?
Jeffrey Zhang: Yes you did. So indeed, if the proto field is set to 1, an implementation can just put that label into the global table, a global label table and that has tremendous savings in the resource. And that's exactly why that RFC 9573 was put in place. But indeed, any implementation can choose to to still just put the label in the context label table and then do it that way as as Torus just said. It doesn't hurt you and then it just makes other implementations easier. So we appreciate that you did the work to allow this interop.
Hooman Bidgoli: Yeah, I again at this point of time Nokia couldn’t care less whether it’s one or two. All I’m trying to say is that if in the future this thing, BIER, starts getting more popular and more vendors comes into the into the picture, this one and two will create chaos. Again, all I’m trying to say is that right now I couldn't care less. Huawei is like doing two and they’re gonna drop it but Juniper is doing one. So you know, maybe you should just think about doing number one, HPE way, and going forward just tell everybody, you know, it’s a MPLS label, it’s one, that way HPE doesn’t have to do any changes. But you know, maybe one is the answer. Just do one and be done with it.
Jeffrey Zhang: So the last year or maybe two years ago we start working on that draft and then I I think we're getting to the consensus except some wording we need to change to to make sure that it doesn't sound like the Nokia's behavior is breaking the RFC. It's not, indeed that's we'll make that clarification, make that change. And hopefully after that, that draft will come through and then clarify a lot of things.
Tony Przygienda: Yeah, yeah, I think the agreement was that you guys hammer out the graph the draft and clear up this basically not precisely enough specified standard RFC and basically show up the options and why the stuff should be retained or not and what would be the benefits, okay. And with an RFC and if they push that standard then basically that should clarify the standard. I don't see the potential for, you know, confusion or chaos as you say Hooman, okay. If if the standard RFC is clearly written and, you know, I concur with Torus and Jeffrey, right. To abandon the stuff to potentially simplify the case now will cost us in the future. All right.
Hooman Bidgoli: Yeah, I agree. I mean maybe in this new draft we can put a wording that says like what we did in EANTC, right? That that this implementation works if you decide to go with BIER Protocol Next Hop but you are context specific. And you know, if we put a wording like that maybe anybody that wants to implement this in the future we can point them to BIER Protocol 1, Next Protocol 1, so they can have interop with everybody then. Rather than then sitting there implementing something and then we go to EANTC and there is no interop. Just an idea. Anyway, that’s all I got.
Tony Przygienda: Yeah, I think the the other part what I was asking you in PIM, right? I’d love to see that we find some time to even if just, you know, informationally appendix kind of the DCB configuration just, you know, that we have a real example of a setup in the document, especially if everybody agrees that using DCB is operationally preferred, right? That that that we have some complete example of that. And then I think, I'm not sure if there was something about the signaling part, which if the signaling part is finished or not, right? But I given how we didn't find time to update the draft this time, let's just make sure we do it before Vienna.
Hooman Bidgoli: Yeah, agreed, agreed. That's a good point. So with the DCB, again, Jeffrey knows this better than I do, I can't recall that entire RFC, there is something in there that says if we are doing DCB, you need to signal, there is a flag or something. Nokia and Huawei we don't set that flag and we agreed with HPE that we will do DCB without setting that flag and HPE is actually behaving correctly, you know, goes to DCB. So we kind of need to clarify this whole thing.
Tony Przygienda: And if people agree on that then basically we just make this thing an update of the RFC and saying if you have this configuration then that supersedes what the, you know, configuration free signaling draft or RFC is saying. So I mean we we can fix it that way.
Hooman Bidgoli: Agreed, agreed. Okay, thank you.
Tony Przygienda: Yeah, thanks for the work. Mm-hm.
Presentation: Export of BIER Information in IPFIX (Slides: https://datatracker.ietf.org/meeting/125/materials/slides-125-bier-export-of-bier-information-in-ipfix-00)
Sandy Zhang: Good afternoon, everyone. I’ll present the export of BIER information in IPFIX on behalf of my co-authors. Firstly, the motivation here, since I presenting in the BIER working group, so we skip what is BIER. So the challenge for the the monitoring BIER flow that the because the BIER has its own specific packet header format. So the existing IPFIX information elements, that means IEs, so cannot export their key flow parameters. So it will create a monitoring blind spot. So the purpose of this draft, we want to define the dedicated BIER IPFIX IEs, so empower the network operators to utilize the standard tools IPFIX to for the real monitoring in-depth analysis and the rapid troubleshooting of BIER flows. We have defined the new IPFIX information elements in this draft. We define 11 new IE. So to capture the BIER header information, the the new IE can answer the key questions about how to monitor the BIER packet, is that is this packet a BIER packet, is it from a MPLS or non-MPLS network, what is the BIER packet TTL, what is the payload type follow the BIER header, which egress routers are the packet's destination. We based on the RFC 8296. So firstly we define the the BIER forwarding IEs. This IE export the fundamental fields from the BIER header used for the forwarding. Firstly is that the BIFT ID section includes the BIFT ID field and the traffic class and S bit. And this will determine the specific forwarding table used by the packet. The second one is that BIER bit string. That is the I think it is a important field in the BIER packet. It exports the bit string field together with the SI and SD, it can identify all the destination BFER. And the third one is the BFER ID that is identify the egress router of the packet. And the fourth one is the TTL. The second part is the packet context IE. This IE provide additional context about the packet's encapsulation, payload, and handling requirements. Firstly is BIER is or not from the MPLS network. It's a Boolean value. The IE can index indicate that whether the BIER packet originated from the MPLS network or non-MPLS. The second one is the next protocol. It will identify the type of the payload like IPv4, IPv6, ethernet. The third one is the version of the BIER version. The fourth one is the BSL or BIER BSL that indicates the length of the bit string. The third part is the additional handling and OAM information elements. These elements used for the traffic engineering, load balancing and OAM. So for the BIER entropy, for the load balancing, and DSCP for the QoS differentiation and the OAM section for the performance monitoring. So we it this is a initial draft for the IPFIX for BIER. So we want to seek some feedback from the BIER group. So welcome to involved in this draft. Thank you.
Tony Przygienda: So Tony here, HPE. Yes, of course, very valuable, very good, necessary. My only question would be how do you envision how we tie together the label to the sub-domain set and so on, right? Because if a monitoring station cannot kind of understand what the label means, it’s hard to do anything, you know, sensible with it. Do we assume that somebody will monitor whatever all the IGP packets and somehow reconcile that? Or some other channel? Because think, I mean I’m getting this stream and it’s wonderful, right? I see the whole BIER. But the problem is in front of that I have the label. And I don’t know what the label means, which sub-domain it is, which set it is, right? It’s interface specific already, right? Because it’s interface to interface, right? So I know who sent it. So somehow, well, first I have to need to know at which interface the packet came. I assume IPFIX has that. But then you see this label in front and you don't know what it means. You don't know which sub-domain which set it is, so it's hard to figure out what BIER is really doing because unless you have the BIFT, right? The forwarding table, which you have to get via some channel. So I think if you write this draft, then it would be good to have some consideration, okay, how do I reconstruct what this label means in terms of what I’m observing.
Sandy Zhang: Okay, okay, we'll consider that and find a way to to solve that.
Tony Przygienda: Well, you don’t have to find a way in this draft, but I think it would be valuable to like here are the considerations, right? Those would be possible approaches how how you correlate this information for this to be useful. You don’t have to say do this or do this, but just it needs consideration because otherwise it’s just it's a data stream that you cannot do really much with because you don't know what what the label signifies. Okay? But generally I think it's super valuable, it's necessary, super. It's just I this consideration section I think would help a lot in deployment in use of that stuff. Thanks.
Sandy Zhang: Thank you.
Presentation: Multicast Use Cases for Large Language Model Synchronization (Slides: https://datatracker.ietf.org/meeting/125/materials/slides-125-bier-multicast-use-cases-for-large-language-model-synchronization-00)
Sandy Zhang: Hello, everyone. I’ll present the second presentation about the multicast use cases for the LLM synchronization. I’ll present behalf of my co-authors. Oh, this is the scenario of the LLM synchronization in the inference clouds. For the emerging inference cloud services, it will deliver the large-scale real-time inference, fine-tuning, and the model optimization services on the GPU cloud platforms. In this figure, we can see the multi-cloud LLM synchronization. For the centralized model repositories automatically replicate and synchronize the LLMs to the geographically distributed GPU clouds. Like in the figure, we have several GPU cloud platforms. Every platform has its local storage. And but these distributed GPU cloud platforms can be set in different regions or even in different operator networks. Next slide please. Oh, the challenge, the challenge of this scenario, the first one is the high concurrency and a popular large model with the size of 70 gigabytes to one terabytes maybe downloaded simultaneously across dozens of GPU clouds, leading to the input or output bottlenecks at the storage repository. It will delay the model distribution at scale. And the second challenge is that the cold start latency. So the inference services cannot start until the model is fully downloaded to the GPU cloud. So the low download efficiency lead to the significant cold start latency. It will delay the user access to inference. And this synchronization process is separate from the training and the inference procedure. But it will directly affect the efficiency and reliability of the inference service delivery. So why multicast is needed in this scenario? Because the synchronizing the large models to the multiple GPU clouds is a typical multicast case. So it's obviously it's point-to-multipoint pattern. The second is it will reduce the IO bottlenecks from the simultaneous downloads, improve the transmission efficiency and minimize the cold start latency. And the GPU clouds span multiple regions, even multiple operator networks. And multicast technology capable of operating across these multiple networks, like like across the core network and metro network. So we have several candidate multicast technologies here to the requirement. And for the PIM-SM traditional multicast technology, it will require a multicast tree to be established in advance. And all nodes along the the tree must maintain the multicast flow state information. And it is slow to respond to the network topology changes. But it can be improved by the fast rerouting mechanism like MoFRR. And it's suitable for the scenarios where the set of the GPU clouds is relatively fixed. For the SR-P2MP and relies on a controller to implement the multicast traffic engineering. And it the replication node required the state, and the multicast tunnel must be established beforehand. And it is also slow to respond to the network topology changes. And it can be improved also by the fast reroute mechanism like the TI-LFA. It is suitable for the scenarios where also the GPU cloud is relatively fixed. For the BIER, it's a stateless multicast technology, and no need to establish a multicast tree on the once, and it will respond the quickly and to the network topology changes. And no requirement the for destination GPU clouds set to be fixed. So this this draft will initiate the discussion of this scenario. We want more discussion about the more detailed requirements and potential gaps. Thank you. Any comments?
Hooman Bidgoli: Yeah, Hooman Nokia. I mentioned this in the other meeting too. I think BIER is great. I think one thing that we need to kind of figure out is that is it going to be BIER, IP, UDP and then the AI payload, or we can optimize BIER somehow to get rid of the IP, UDP and just have like the the payload of AI? I don't think this working group is knowledgeable enough with AI to see if there is any optimization there or not, but it would be great if we can close that gap of knowledge transfer to see if there are some optimization here and if there is optimization, you know, what drafts or what we need to do. That's the only comment I got.
Sandy Zhang: Thank you. Thank you for your suggestion.
Tony Przygienda: Well, we have in ethernet types, it's nothing prevents you from carrying the stuff naked, right? It doesn't have to be carried in IP. And it doesn't precondition MPLS plane either in this sense. Yeah.
Hooman Bidgoli: Yeah, sorry, just yeah, Tony, I 100% agree with you, right? I mean I think the BIER technology can greatly simplify this if we understand exactly what is the AI layer, whatever it is, maybe a new protocol or something, BIER Next Protocol. That's what I'm trying to say.
Tony Przygienda: Yeah, that’s obviously that's absolutely possible, and you know, I know that when it comes to HPC and all this AI stuff, you know, the header optimization is a big issue. So if the AI folks show up and say, look, we need to truncate the header, have a simplified BIER header, new version, whatever, sure, all of that I think we are chartered to do. But we need, you know, really clear requirements what is needed and how they'll deploy it, right? Otherwise we'll just go and shoot in the dark and optimize towards something we don't understand or doesn't matter. Thanks.
Tony Przygienda: Well, um, the one one of the basic ideas was to see that most applications that we would want to replace unicast with are just, you know, doing replication to a set of destinations, right? So one of the abstractions could simply be to say, okay, here here is the list of destination you want to replicate to that and well, if you have BIER it's faster and if you don't have BIER you're using unicast, right? So that simple layer to introduce BIER into these environments.
Tony Przygienda: Yeah, okay, but I mean that's layer above BIER. There's nothing we can do about it. In a sense it's an application optimization or kernel optimization something like that. So how would we get involved? I mean we have the technology. I mean the U-BIER stuff is on the floor, bar a better suggestion, right? So we should deal I I really liked, sorry I’m distracting here a little bit, I really liked this gap analysis that was shown in the routing working group. It was the one graph that was basically covering where is it useful, what are we lacking. And the only lack in BIER to cover the whole envelope as the optimal multicast technology was really this sparse groups, right? Optimization. But anything above that where an application chooses to use, you know, go on the BIER or not go over BIER, I I don't think how we can get involved to make things any better, you know, as a working group.
Tony Przygienda: No, I think the important part is to understand that as long as we don't get rid of the IP multicast layer, we're stuck to the limitations of how fast, you know, receivers can join groups and so on, right? That's the stuff we want to get rid of if you really want to want to leverage the application ability to quickly send a packet now to these 50 receivers now to the other 50 receivers, the stuff that you do in Hadoop and in many of these applications.
Tony Przygienda: Yeah, Torus, but the question still stand, what can we do as a working group here? I don't think it's neither in our scope nor can we be productively do anything. I agree with what you say, but I don't see how we get involved as a working group into that stuff.
Sandy Zhang: Torus, Torus. Let's move to the next presentation. Yeah. Thanks for the work. Mm-hm.
Presentation: 4-Scalable Data Plane Architecture for BIER (Slides: https://datatracker.ietf.org/meeting/125/materials/slides-125-bier-scalable-data-plane-architecture-for-bier-00)
Zhigiang Li: Hello, everyone. I’m Zhigiang Li from China Mobile. I’ll present the draft scalable data plane architecture for BIER on behalf of my co-authors. First, why revisit BIER forwarding now? BIER forwards multicast traffic without building explicit multicast trees in the transit network. The BFR encodes the destination set directly in the bit string. So the core stays stateless on a per-flow basis. That model is attractive for large domains, multi-tenant data centers, and AI ML clusters where multicast load can be bursty and short-lived. The challenge is that larger deployment often require long bit strings, 256 bits or more, which stress high-speed forwarding hardware. A quick retrospect for BIER forwarding. Each bit in the bit string represents one egress BFR or BFER. The BFR sets the bits for the intended destination set and imposes the BIER encapsulation. Transit BFRs consult the bit index forwarding table or BIFT to determine next hops and forwarding masks. The BIFT ID identifies the forwarding context. In non-MPLS BIER it uniquely maps to SD, SI, BSL across the domain. RFC 8279 procedure briefly have four steps, but it makes the forwarding node have latency pressure and table pressure or resource pressure. First, latency pressure. Processing cost grows with the BSL. Scanning 256 or more positions is expensive at 100 gigabit or 400 gigabit line rate. A direct per-bit pipeline can stretch stage budget and determine the throughput. Then the table pressure. A standard BIFT keeps one entry per BFR ID. Each entry stores a next-hop identifier plus a full length FBM. For BSL equals 256, that means 256 entries each carrying a 256-bit mask for one SD/SI context. So, the draft's design object and key idea is changing the forwarding flow from bit-centric to interface-centric. That is pre-compute interface grouping once then filter the bit string per interface. And the semantic equivalence is mandatory. The same replicas and the same per-interface bit string must be produced. No change to the BIER header, BIFT ID meaning, or protocol signaling. The improvement come from decoupling who gets a copy from which bits survive on that copy. This is a architecture overview. The whole architecture have two stages. First stage is the replication lookup stage and the second stage is the bit string isolation stage. When a BIER packet received, the forwarding node extract BIFT ID from BIER header then use BIFT ID check the RMT table to get the related interfaces. Then after copy packet to the related interface, stage 2 do the bit string isolation, that is the bit string and interface check IBMT table's result. If the result is zero then drop, if else it forward. This is some detail about the two tables. First is the replication memory table. The RMT maps a replication group identifier to the set of egress interface that have at least one downstream BFER. In practice, the replication group identifier may be derived directly from the BIFT ID. The control plane populates the RMT by grouping BIFT entries according to their unique BFR neighbors and their associated interfaces. The table size scales with the number of forwarding groups, not with every individual bit position. The right side is what the table looks like. The replication group ID is the key and the egress interface list is the value. Another table is the interface bitmask table, IBMT. The IBMT maps each egress interface to one forwarding bitmask or FBM. That FBM is the logical OR of all downstream BFER bits position reachable via that interface. At runtime, the BFR computes bitstring_new equals bitstring_original AND interface_key check FBM table's result. Entries are updated whenever BIER topology or BIFT results changed. The right side is what the IBMT table looks like. Egress interface is the key and the forwarding bitmask is the value. This is the forwarding procedure, have four steps. First, step 1, resolve the group. The forwarding node extract the BIFT ID, derives the replication group identifier and query the RMT table. Step 2, replicate by interface. The forwarding node create one packet replica for each egress interface returned by the RMT. Don't edit the bit string yet. Step 3, isolate bit string. For each replica received by the interface key check IBMT table's result and compute bitstring_new equals bitstring_original AND interface key check FBM table's result. The last step is decide and send. If bitstring_new is zero then discard, otherwise overwrite the bit string and transmit on interface key. The local optimization must produce exactly the same per-interface copies as a correct RFC 8279 implementation. If no RMT entries match the BIFT ID, the packet is discarded and the draft recommends logging an error. All other BIER header processing such as TTL heading or label updates follow the normal RFC procedures. There are two examples. The first example is all three interface remain active. And the second example, zero result cause a clean discard. What's the relationship with RFC 8279? Something changed and others remain the same. What stays the same? BIER header format and encapsulation remain unchanged. BIFT ID meaning and ordinary BIER processing stay unchanged. The output replicas and per-interface bit string must be identical to RFC 8279 forwarding. What changed locally? The router may organize local forwarding state as RMT plus IBMT. The data plane iterates over interfaces instead of bit position. This is a local implementation optimization, not a new protocol. So deployable without any new signaling on the wire is the design principles. A practical forwarding pipeline view, illustrative only. First the packet go through the pass-header model, then do the lookup RMT table, then through the replication engine and do the AND zero detect. Finally, egress rewrite. Likely the implementation have some benefit. For example, deterministic stage count, compact interface-oriented tables, natural fit for replication hardware. And there are also some key assumption behind the gain. A typical BFR has far fewer egress interface than available BIER bit positions. This is why grouping by interface reduces the amount of per-packet work. Some operational interpretation. RMT and IBMT are updated by the same topology or control plane changes that already affect the BIFT. No changes are required on remote nodes to benefit locally. And some deployment considerations. First, the control plane's role. RMT entries are derived from BIFT computation results. IBMT entries must be refreshed whenever topology or BIFT state changed. Protection. All RMT and IBMT should follow the same security model as the BIFT. Some packet handling edge cases. If no matching RMT entries, discard and log. Zero result after AND, discard that replica only. Non-zero result, continue with ordinary RFC 8279 or RFC 8296 processing. Where the gain is strongest. Large bit string lengths or many many receivers, like the AI scenarios. Relatively small interface fan-out. Hardware data planes that prefer fixed simple bitwise operations. Okay, a brief summary. The proposal keeps BIER semantics, encapsulation, and signaling unchanged. It replaces per-bit forwarding work with an interface-centric two-phase pipeline built on RMT and IBMT table. The routing data plane scales with egress interface rather than with bit string length, which is a better fit for high-speed hardware. Any comments are welcome. Thanks.
Sandy Zhang: Jeffrey.
Jeffrey Zhang: Can you go back to slide number 4? Here on the left side, step number 3, you said accumulate the interface forwarding bitmask. What do you mean by that?
Zhigiang Li: Accumulate the interface forwarding bitmask? In RFC 8279, it's a per-bit operation. First the forwarding node check bit string's bit. If the bit is one, then check the FBM table to determine the interface. It's the standard procedure. But in this draft, first do some broad candidate set from RMT plus exact per-interface mask from IBMT then equals the same output as per-bit forwarding.
Jeffrey Zhang: Right, but there, once you find an actually it's not interface, it's really neighbor. You do the lookup on the first bit that is set on the right side, you get an entry. That entry includes a neighbor which has a bitmask. That bitmask is pre-built. You do not accumulate the interface forwarding bitmasks on the spot for each packet. Right? So step number 3, if I understand it correctly, then it’s not true. Um, so imagine that you have 256 bits all set in the bit string. But if to reach all those 256 BFERs they all go through the same neighbor, you will only do one lookup and that neighbor has the FBM will clear all the bits. You send one copy to that neighbor, you’re all set. So indeed, um, you if you have more neighbors to forward, you would scan the bits again. But the number of lookup is bound by the number of replication you have to do. Now you can go, we can do the model you were saying that we do the iteration, not based on the bits but based on the interface or more accurately it should be the neighbors. Then even if you need only have one bit set in that bit string and if you have 10 interfaces, you will do 10 iterations. So it's a a tradeoff to me.
Zhigiang Li: Okay, I do some explain. First, it’s I change the page to this. The whole pipeline only have two stage. But some pre-computing is required to use the BIFT to iterate all the related interface, do some pre-compute work. Then the key sentence is there. First do some broad candidate set from RMT plus exact per-interface mask from IBMT then equals the same output as per-bit forwarding.
Tony Przygienda: I think the key is that that FBM is pre-computed. It’s not, right? So yes, yes, right. So yes, the the key sentence is there. First do some broad candidate set from RMT plus exact per-interface mask from IBMT then equals the same output as per-bit forwarding.
Jeffrey Zhang: Comment?
Zhigiang Li: Yes.
Jeffrey Zhang: Yeah, I I won’t claim that I completely understand the proposal, but let me let me point out to two alternatives, right? So we’ve been working on prototypes and showing them off even here in CNCF had had a test for a more scalable version where we can support much sparser, larger networks, um, and we could drive that forward we just haven’t, you know, at this point in time continued to do it just because of the priorities and, you know, as soon as the working group wants to have that being picked up again we can perfectly well revive that. Um, shorter term if if you have, you know, problems with the forwarding complexity, the BIER-TE actually kind can use exactly the same header, pretty much the same BIFT, and you have the interfaces in there, right? So one of the tricks you can do with the BIER-TE is that all the bits that are relevant on one router you put adjacent, right? So it's a little bit of bit string management. You use a very long bit string, but on each router you only need to actually process a very small subset of the long bit string, right? That's already an optimization you can do without having to change anything of of the BIER header or the long bit string, right? So use TE because you hop-by-hop want to steer the traffic, right? And that gives you the ability to avoid having to look at a very long bit string completely on every hop.
Zhigiang Li: Okay, thank you.
Tony Przygienda: All right, so from my side, so first observation is that the 8279 forwarding procedure is non-normative, right? In the sense that it never says you must implement it this way. It is just normative in the sense that whatever you build has to be the behavior observable behavior must be identical. Um, and that was intentional because we knew that the multiple vendors had different strategies how to implement this this kind of lookup and some of them were talking about the stuff that that you are showing. So it’s basically it’s valuable, right? As for example informational draft as this is one implementation strategy how to implement it. And you are pretty honest about the tradeoffs, right? With interface fan-out this whole thing becomes far less interesting whereas with few interfaces maybe a better way to implement it. Um, I’m missing two things, two consideration in the draft. One is a section on ECMP. You have to show that the ECMP behavior is equivalent to the 8279, it’s very important. Okay, so ECMP section. And the second one is consideration about, um, on control plane changes, how much of ASIC updates write you do. Because this is a very important consideration, right? Um, the ASIC throughput is very very often is the bottleneck. So you may waste way more memory on the ASIC, but when you have control plane flapping, you write much less to the ASIC, which may be far more interesting than actually having a smaller table. All right? But it all very much depends on technologies that you have at your availability. So I would encourage a section on ECMP because without that it’s not clear that it’s really conformant. And the second one consideration about, you know, how much do you have to write to the ASIC depending on control plane changes. And otherwise I think yeah, it’s valuable, you know, because that will give an implementation guidance to people and, you know, it’s a good it’s a good data point in different ways you can implement BIER forwarding. Which was actually, you know, the intention from day one that we do describe the simplest way to implement it, but we knew that people had a lot of interesting ideas including building specialized lookup hardware that can do like vertical lookups and whatever not. Thanks. That’s all from my side.
Zhigiang Li: Okay, thanks.
Presentation: 5-BIER-FRR update (Slides: https://datatracker.ietf.org/meeting/125/materials/slides-125-bier-bier-frr-update-00)
Tony Przygienda: Okay. Let’s move to the last presentation. So, I’ve been trying to start, huh, I thought I uploaded PDF. Hmm. Darn. These these look pretty crazy. That that's not good. Huh. Maybe they will be changed automatically, yeah. We have uploaded your your newest version. Yeah. So, yeah, all the work on these yeah, it's it's too late now. Um, so I I tried to add and and write better the, you know, introduction and explanation of of the whole mechanisms and in the process of doing that, I think I also figured out some of the things that were technically redundant and not really needed so I think that we can simplify it from there on out. Um, the text that was added primarily was also to to better explain BIER to those people coming in that only understand FRR from the unicast side and make it as easy as possible for them in their own unicast terminology to understand how BIER-FRR works so that um, it's it's very clear that it is a very, you know, related technology and very easily derived from the unicast FRR that people already are doing. Um, and likewise for um, the BIER people that um, it it uh that they understand the terminology that’s being used in unicast FRR because that wasn't in the draft so far. Um, so the the examples were really difficult. Um, it’s unfortunately there is a little I don't know what the mess is there on the left-hand side. I'll have to upload another set of the slides so that for the post um, it's it's better. Um, so the important part was that FRR node protection in stateful multicast doesn't work because one of the things you would have needed to do was to um from the point of local repair send out another unicast encapsulated copy to each next-next hop when the node is failing and you wouldn't even know which next-next hop you need to send it to. You may have 10 next-next hops but only two of them need to receive a packet for a particular S,G, so you can either flood to all of them unnecessarily or you would have to completely and very complex improve PIM or so that you do have the state of the next-next hops for every S,G, so we we thought about this very long and hard in the beginning of the 2000s and nobody ever wanted to go through that trouble, right? So we don't have node protection for IP multicast for good reason. But we can have node protection for BIER and that’s shown on the right-hand side very easily. Um, R1 being the point of local repair knows that R2 is broken, so the only thing it needs to do it needs to tunnel to some good enough node, um R11 up there, and from there on the BIER packet flows out again and perfectly well does replicate itself to reach all the same destinations it would have reached without um the failure happening, right? So so that is as beautiful as BIER-FRR node protection will work in most cases and it's a huge benefit if you want to have resilient multicast services in the same way as you have FRR resilience in unicast. So um, the document right now is defining a tunnel mode and um an LFA-based mode. And I think ultimately and I have to discuss that with the co-authors, we should really scrap the tunnel mode because it's it's really not a real significant special case. Um, what is shown here for the tunnel mode is that yeah, we understand what tunnel mode kind of is meant in this in this current draft when we’re talking about link protection. So now we’re not talking about R2 failing, we’re talking about the link L1 from R1 to R2 failing and at that point in time you want to send the packet in a unicast tunnel all the way to R2, so and that is actually something you can already do with the with existing stateful multicast because then R2 of course has the existing tree, right? So this is kind of something that doesn't even require BIER but link protection isn't really all that interesting. You're only using link protection network for final nodes where, you know, you can't do node protection, um because ultimately when a link fails you don't know whether the node or the link has failed so the best thing you do is always assume the node has failed, um and with unicast FRR that actually even gives you better results, right? So the this this case is useful, it’s good, but it doesn’t really have to be a special case as we’ll see later, right? And then basically the whole tunnel mode node protection doesn’t work again with the same reasoning that if if you were going directly to the next-next hops in BIER, that wouldn't work because you would still need to figure out how to make that work in so far as each of these tunnels would have a different set of BFERs that you go to, so you can't simply set this up statically, you have to calculate something and voila you are in the solution territory of the LFA-based FRR, right? So that's kind of here so to say that we need to cut down or scrap the tunnel mode specifications of of this document. Which brings us to the really good mode, which is LFA-based BIER-FRR. And so this document is now explaining those three modes that have been defined in unicast, right? So the LFA is the very simple one where you have a direct neighbor. Um, there is also this terminology of P and Q space introduced which if you understand unicast FRR then you know what it is and darn. So, and then you've got the remote LFA and you got the topology-independent LFA. And they’re all equally applying to BIER um with the example topology I had initially and um there is only one significant difference in so far as that um in BIER you may not actually want to go to exactly all the same LFA destinations that you have calculated for unicast because then you may need to send too many copies. You can optimize that if you do a BIER-optimized calculation of the LFA endpoints then you'll end up with a fewer copies that you need to send out. So this is also completely broken. Darn. So the the important fundamental part where the BIER considerations become more interesting is when you have the so-called partitioned Q space, which means that upon a node failure, um you cannot reach all the destinations you want to send to through a single tunnel that you're building up, right? In this case you see that when R2 fails the whole network becomes partitioned between the top and the bottom, and from R1 you need to send the traffic back in one tunnel to the bottom and you need to send it back through another tunnel to the top receivers. And that basically then brings us to the the problems that we have currently trying to figure out how do we propose this problem to be solved. The current solutions in the draft are using the best optimizations, which is really when you have instead of a normal BIFT the ability to extend the BIFT so that the BIFT changes in some fashion when you have an FRR event. Unfortunately, in a logical sense for every possible link that is failing you ideally would like to have a separate BIFT. And that of course may be very expensive and doesn't scale necessarily when you already have thousands of BFERs for thousands of bits and then, you know, now you need to take that times 50 when you have 50 interfaces. So one of the other options where that that one can do which also goes back to to this example is that you need to simply partition the receivers by the sets after the FRR event. And what you can then do is you can simply have instead of the normal adjacencies to each of the BFR next-hop, you use your protected adjacencies like you have in unicast, which says go out this interface but of course you're checking at the point in time when you're sending the packet is the interface down. Well, then encapsulate another tunnel. And here the interesting part that you could do is that R1 instead of sending just one copy to all the receivers normally, it would actually send out two copies, one for R5 and R6 and another one for R7 and R8. And now that you've basically separated this out as two different adjacencies, it would also mean that you do not have to change the BIFT. You’re just using two different protected adjacencies and both of them are being rerouted when the interface fails, right? So that would be a way where you are making something for FRR work much easier when you have an FRR case at the cost of in normal operations sending out two copies to R2, right? So that I think sounds crazy in the first place, but it would allow you to do FRR without a lot more FRR cases without having to change the BIFT. You only need the protected adjacencies. So that might be actually quite interesting to bring implementations of BIER-FRR closer because I think while we do while we did for P4 a prototype of BIER-FRR with the changed BIFTs, I'm quite I have a big question mark as to how valid they would be for commercial BIER routers. So yeah, so I I’m starting to I can’t even do nice PowerPoint presentations right now, but I would love to see that I use the opportunity here to to get away from the ASCII pictures and into SVG pictures because when these things actually are displayed correctly I think they’re very instructive to how the stuff works. I hope yeah, I’ll upload the the PDF as well if seemingly this missed the PPTX and the PPTX conversion did fail, something like that, right?
Sandy Zhang: Maybe you can change it to PDF and then upload it.
Tony Przygienda: All right, so this is work in progress, um, but but hopefully this this will end up becoming a lot more useful both explanatory for people to understand it but also from the conclusions of what can be done easily. From my side observation is that we got the stuff kicked back from IESG with unresolved reviews, please resolve reviews and then frankly I start to pay attention because the stuff is complicated. It got the initial comment when the work was adopted that, you know, it has to be sold as, you know, beneficial on top of the very simple IGP-FRR. Um, and at this point in time until, you know, the especially who made the big comment until the stuff is resolved, I mean we won’t be able to progress it, you know, in any meaningful fashion because it will get basically kicked back again. Thanks.
Sandy Zhang: Sure. Okay, so that’s all for the presentations. Is there anyone want to say something? Tony, do you want to say something in the end of the session?
Tony Przygienda: Um, no not really. Um, all going fine. We have a lot of documents in the queue with also my fault. Pay more attention to move them forward towards the IESG. Um, but you know, I find comb the stuff with Sandy and we basically start to grind the stuff down. Um, thanks.
Sandy Zhang: Okay. So we have done today. Yeah, the interesting discussion which also going on is, you know, with this um, multicast in AI. We'll see whether anything comes out of that useful and we should pay attention, you know, accommodate when real requirements are coming in that that make BIER useful and people are literally willing to to deploy the stuff. Yeah, thanks.
Sandy Zhang: Thank you. Thank you. And see you in Vienna. Yeah. See you in Vienna. Bye bye. Bye.