Markdown Version

Session Date/Time: 16 Mar 2026 06:00

Stig Venaas: Can you hear us okay online?

Lenny Giuliano: Yes.

Stig Venaas: Okay, thanks. I'll turn this off. Yeah, thanks for updating the document then.

Mike McBride: Oh, yeah, because it just happened this morning.

Stig Venaas: Yeah, exactly. You want to stay in this?

Mike McBride: Yeah, let's do.

Stig Venaas: You stay the same space, okay. Hi, everyone. Welcome to PIM. Let's see, so this is the Note Well here. So hope you all have seen this, and please make sure you are familiar with this slide. Here is our agenda. Yeah, so I'm going through the agenda, I mean, the first one here right now, and then we've got several items afterwards. Any comments on the agenda? Any issues? Okay, then I'll go through the working group status. So we've got the draft-ietf-pim-rfc1112bis. I think we all thought it was done long time ago, and I'm sure Torlus is ready to be done with it, but yeah, there was a lot of really good review feedback since last IETF, so Torlus is incorporating that. But yeah, we are pretty much ready to request publication, I think. So yeah, if you like, have a look at the latest revision, see what you think, but yeah, we're hopefully ready to request publication now. DR improvement, backup DR, they are not discussed this meeting. We're trying to push a bit to make some conclusions there, but yeah, we're not ready to discuss it today, unfortunately. The point-to-multipoint drafts are in the RFC Editor's queue. We also got a new RFC since last time, so that's great, the MoFRR. Then next slide, let's see, I keep forgetting to do this. Okay, we've got some YANG model drafts, not discussed this time. Lessons learned we will discuss. Then we got the multicast auto-discovery work, so that's progressing. We've got these four drafts. So one of them is now in the RFC Editor's queue. One is in IETF Last Call. And then for gap and zero-conf, they both passed Last Call, and we are waiting on some directorate reviews or incorporating review comments. But yeah, we'll probably request publication for the two remaining shortly. We've got the draft-ietf-pim-multipath-igmpmldproxy, not discussed today. The PFM forwarding enhancements, publication has been requested for that. We've got the ECMP, deterministic ECMP, we need to get some work done on that, not discussed this meeting. We just adopted draft-ietf-pim-multicast-over-srv6. Just adopted the draft-ietf-pim-rfc8059-9798bis draft, so we'll talk about that later today or this meeting. And draft-ietf-pim-flex-algo just got adopted, like this morning or so. So a lot of activity. So any comments on that or we'll move on? Yeah, okay. Gunter.

Gunter Van de Velde: I have to sit almost on your lap here. Voilà, I'm going to put it more in the middle of the room. No, just like a generic comment. So we've been pushing through, by the way Gunter Van de Velde, routing AD. So we've been pushing through the last couple of months quite some drafts from the PIM working group, and one of the common trends, you know, we have been fighting is the alignment with the charter. So that is maybe something, you know, for the future to be a bit more careful about, that if we adopt documents, you know, that they actually are in the charter, and also that the shepherd writeup itself, you know, has like an explanation of why the document actually is in the charter. It really helps because we saved like, you know, a few discusses already, you know, by doing that. So that was actually quite a good practice. But I think, you know, going forward, if we adopt something, you know, check if it is in the charter. If not, we do an update to the charter, not a big problem, just some red tape we have to accommodate.

Stig Venaas: Yeah, we just adopted a couple of documents and maybe we should discuss about them and the charter. At least these are documents that the working group wants to work on, but but yeah, let's discuss about charter and and see if we need any adjustment there. Yeah, okay, thanks. Go ahead, David.

David Lamparter: David Lamparter. Perfect lead-in, because it's not in charter. There is this 6man draft, this takes in the new IPv6 multicast address format between host and link scope. It's used to tie into special behavior groups from layer 2. That's the one-sentence summary. Look at it if you're interested. It's in 6man, not here.

Stig Venaas: Yeah, just as a comment, so one interesting thing is of course that we don't have a scope identifier for it, so it's a little tricky, but anyway. Okay.

Mike McBride: Okay, so this draft has been presented several times. I think we may, we're in no hurry to progress this draft, but it may be nearing the time where it's getting close to being done. Um, it's a pretty good document, if I do say so myself, trying to, um, just kind of give an overview of just the last several decades of lessons we've learned of different multicast protocols that we've used. This is, the one in red is what we per your comments last IETF we added a section draft-ietf-pim-igmp-mld-snooping-yang-l2vpn-ext. It was suggested that we also add this by someone, to talk a little, just briefly mention MLD and what it you know, what it corresponds to with versus IGMP. And so we added this statement to the IGMP section. And then we added a bunch of text, which isn't great for PowerPoint, and we're not going to just read it here, but hopefully you've kind of had a chance to just briefly look through it. It's nothing new to this group. It just goes over what the purpose of IGMP MLD snooping is. It shares a little bit of the operational problems that have occurred over the years, like when a functioning querier disappears and how sometimes that can, the group state can be black-holed unintentionally. And so just some basic comments about that, like some of the things that we learned while using snooping. Um, and you can't see the whole slide here, but at the bottom there's an RFC that was also mentioned, I think Mankamana mentioned it last IETF. There's a draft in, or a RFC in that came out of MAGMA about 20 years ago, I think. And uh, it's a pretty good overview of IGMP snooping and some of the operational issues, so we reference that in the document and uh it I think is is a decent addition to the draft. So this is the same next steps as we've had for the last year or so, we continue to make progress. It's a little bit of a unique draft in this working group because it's just kind of an overview of protocols. Uh, if there's anything else that people can think of that we need to add to this draft, you've done great at giving some feedback, we continue to update it. Uh, feel free to bring it up now if you want or on the list. Any comments for now?

Stig Venaas: I wonder if this is in our charter or not, but...

Mike McBride: It is! We updated, we updated the charter to be able to do, yeah, exactly. Yeah. So it's in charter. Okay, so we're going to move on, but...

Stig Venaas: It may be getting closer Stig because you asked before if we're ready to progress the draft.

Mike McBride: Right.

Stig Venaas: And we may be slowly getting there. We'll think about it. Right. Should we ask the room or no?

Mike McBride: Not yet. We'll wait.

Stig Venaas: Okay. Oh, okay.

Speaker 1: Is the queue open?

Mike McBride: Yeah, go ahead.

Dave Thaler: Hey, Dave Thaler. So, two comments. You'd ask is there anything else to add in there. Um, I did a brief glance and I don't see anything about the socket APIs RFC that we did. And so it might be useful to have a paragraph or something talking about that work. Um, and that might also be a place, a couple minutes ago somebody mentioned about scope IDs, that'd be the appropriate place maybe to talk about the concept of scope IDs for multicast.

Mike McBride: Can you do you know the RFC offhand?

Dave Thaler: Um, I'm an author on it and it was way back, it's like I don't know. No, but I'll put it in the chat.

Mike McBride: Okay, that'd be very helpful. Thank you. Yeah, that's a good one. All right, we'll we'll add that. Is there another comment?

Speaker 2: Prasidh, Cisco Systems. Um, so would it be nice to add something about the quality of service QoS or any other data plane considerations that that are specific to multicast? Because multicast is a lot more tough on the data plane.

Mike McBride: Did you understand that? Can you repeat that again, if not?

Speaker 2: Prasidh. What I wanted to ask is, would it be good to add a section on quality of service or anything about the data plane in general?

Mike McBride: So quality of service and data plane in general, make some comments about that.

Speaker 2: Yes, yes.

Mike McBride: Okay. Very good. Yeah, you I think you're the one that helped us with the the last update that we did on snooping. So um okay, yeah, thank you. We'll get that added.

Speaker 2: And and if you want any help, I'll be glad to help if please.

Mike McBride: Great! Perfect. Thank you. Thank you. Thanks. Okay, thanks. Torlus is not here, so let's see. Yeah, so Prasad you're you're up next if you are ready.

Prasad Miriyala: Sure Stig.

Mike McBride: Do you have your microphone on?

Prasad Miriyala: I'm sorry?

Mike McBride: Do you want the slide control or do you want us to forward the slides?

Prasad Miriyala: Uh you you go ahead, I'll just let you know if the slide needs to be moved. Mike, is that okay?

Mike McBride: Yep.

Prasad Miriyala: Thank you, Stig and Mike. Thank thank you Mike. Good good evening everyone, I'm sorry, good afternoon everyone. It's still morning here, sorry about that. So um yeah, this is mostly covered, this was mostly covered in Montreal, I'll just try to quickly recap what was what was covered in Montreal and and what has changed since then. Of course, there had been some changes to the draft after I made the slides, so I'll talk about that briefly as well. Next slide please. So yeah, this as as Stig mentioned earlier, this was adopted after the Montreal meeting in the Jan timeframe, but this was earlier presented in Madrid also and and essentially this this work is based on um getting RFCs 8059 and 9798 both into the um standards track. They are both informational RFCs. Next slide please. Yeah, so why why do we need this document? Um, I've just mentioned that both the documents 8059 and 9798 are experimental. We have enough deployment experience, um you know, um this is probably one of the last few LISP multicast documents that are undergoing the the experimental to standards track bis process. Um, basically, um the 6831bis is also, I think it's just cleared Last Call since last IETF, so that is the base document for the LISP multicast and and that is done in the LISP working group and this is being done in the PIM group primarily because their parent documents, that is RFC 6831 was done in LISP and 8059 and 9798 were done in PIM and therefore their um child documents, if you will, are are being done here. So the the documents what do they do? They essentially have a essentially a mechanism to have the LISP ETRs signal the um underlay transport type, whether they they you know, in in multiple LISP sites when when they come together, um the underlay can be unicast or multicast, or rather the underlay can be multicast capable or not multicast capable, and then there and there can be a mix of RLOCs or LISP sites which support a V4 underlay and a V6 underlay, and given the heterogeneity, the four combinations I think we need some signaling mechanism so that the receivers can signal to the um uh you know, ingress points what kind of transport you know they would like to receive the multicast packets on or the the the overlay traffic on. And that's that's the reason for having this particular document. Um, why were there two documents in the first place? Um, 8059 was done probably a few years earlier and at that point of time the underlay was only limited to unicast. Um, there was no support for a multicast underlay and 9798 came along like probably last year or maybe two years before, um and and this added the support for the multicast underlay. And therefore there were two documents and and you know, now that everything is kind of settled down, we would like to converge on one standards track document that essentially combines the two base experimental documents. And from implementation perspective, yes, there are um you know, vendor implementations available um for this particular signaling. Next slide please. Yeah, so just to go over the details, you know, um there is a join-prune attribute defined. Um, um and of course the last bullet I you know sorry, it's it's a remnant of the old slide. the IANA code points can be reused. The question mark at the end is redundant and therefore it's not a question, it's a conclusion. And yesterday there was a comment, or rather on Saturday, there was a comment on the text on the IANA section that came from the IANA early review. So thank you to the IANA team for doing the early review and they have I think in principle agreed to reuse the code points, or rather update the IANA code points so that the experimental documents no longer own the code points and the new bis draft will own the code points and the language in the IANA section has also been corrected. I Amanda Baber from um the IANA team had sent a comment detailed note and I had made a change to the draft and that draft change has also been pushed, it's reviewed by Amanda and I have checked in, or rather you know, submitted the new version and therefore the 2 -01 version I'm sorry, the -01 version of this draft should now have the correct IANA section also. So that is also taken care of. Next slide please. Yeah, I think that's mostly it. I think you know, we'll continue to work on the document based on any feedback that we got. Of course, thank you for the IANA feedback. Any other feedback please let us know, um any detail that needs to be added and then we'll probably at some point of time request WGLC once we get some sufficient comments or once the draft has gotten through sufficient review. I think that's mostly from my side.

Mike McBride: Any questions for Prasad? All right, thank you.

Prasad Miriyala: Thank thank you Mike and Stig, thank you.

Mike McBride: Torlus just left again. Oh, maybe... they think he skipped? So you're up.

Stig Venaas: Okay. All right, um I'm going to talk about this new draft um that talks about how to actually send native IPv4 multicast through an IPv6 core network. Which sounds maybe a bit strange because if you forward V4 packets, is it really an IPv6 network? But yeah, we'll see. Um so so you basically have a network that in a core um doesn't have any IPv4 addresses assigned to the routers, so the routers only can send IPv6 message protocol messages. Um but we want to somehow build a V4 multicast tree to forward native multicast V4 multicast through that network. Um also we will talk about RPF lookup and how we can find IPv6 PIM neighbors for an IPv4 (S, G) or (, G). Okay. So if you look at this network, well just consider unicast on this slide. So we have these five routers, the three in the middle um only have V6 addresses assigned, that's our core network. Then we have the the two routers at the edge that have also V4 multicast, sorry V4 unicast addresses assigned to them. They can be some more routers outside of the edge, um it doesn't have to be all the way to the source or receiver but but the main thing here is we have a core and then we have some routers at the edge of the core. Um so um yeah, um let's see. So um if we do multi-protocol BGP, there is a way for us to exchange IPv4 prefixes over a V6 peering. So basically you you use a BGP session TCP that is using IPv6 messages but the contents is IPv4 prefixes. So by doing that we will be able to get V4 prefixes with a next hop in our RIB. Um with an IPv6 next hop. Um and that's exactly what we need to forward unicast packets. We need to know when we get a V4 unicast packet which which router to or which interface to forward that packet out of. Um let's see what am I... Um yeah, okay. That's it for unicast. So for multicast, um so we have the same situation with a RIB where we have V6 sorry IPv4 prefixes with V6 next hops and we will make use of that for multicast. Basically when we do an RPF lookup to find which neighbor to send a join to, we will get the V6 next hop from the RIB and then we will send a PIM join to our IPv6 PIM neighbor, but the contents of the join will be an IPv4 (S, G). Or it could be (, G). Um so um that way we can send PIM messages between the routers using IPv6 because there's IPv6 header, but the contents is still IPv4. And that's sufficient to build an IPv4 multicast tree and forwarding as usual. Let's see, um yeah. Are you all with me so far? Okay. So if you look at the Sparse Mode RFC, or yeah PIM Sparse Mode RFC, you can see that all the addresses that are in the PIM messages they are in encoded address format, and that format allows us to specify address family. So when we send a join, even though it's an IPv6 header on the join, it's possible based on on at least the format to encode and say that this is an IPv4 address. So what I want to do here is we want to say that the the upstream neighbor address is a V6 address because we are sending this to our PIM IPv6 neighbor and we have a V6 next hop from the RIB, so that makes sense at least to me. Um but a multicast group and source is IPv4. If we look at the um what RFC says some more, we can see the encoding format where you see address family there, so that we are making use of that. Um what is interesting though a bit is the the text I quoted from the RFC there where it says that the group and source addresses must be of the same family, that's fine, I want them all to be IPv4. But then it says that you're not permitted to mix V4 and V6 addresses in the same message. And it's maybe slightly unclear whether it's just discussing the source and group addresses or if it also discusses the upstream neighbor address. And then there's a should what you should do if you see that. So the thing is the the format in a way allows this to be done, but it's not quite maybe according to the RFC to do what I want to do. Depends how you interpret not permitted and the shoulds. But the main thing is, this can sort of be done, but we don't know how a router would react if we tried to send such a message to it. So because of that, um let's see am I... Okay. Sorry, just... Okay, yeah, so this is an example join message. So you can see here that in upstream neighbor address field we have encoded like a link-local address like which is a V6 address, while the multicast group and source here are IPv4 addresses. But yeah because of the compatibility issue yeah, I want to introduce this hello option. So basically a router can indicate that I'm I'm capable of of processing such messages. So we would only want to send a join like this if we know for sure that the the neighbor can handle it. Um one more thing here. So if you want to do IPv4 unicast through your network, then you need to have an IPv4 RIB on each of the core routers so they can forward a packets. But if you don't care about IPv4 unicast, I mean we only care about multicast, right? Then we can use a RPF vector and we don't even need to have IPv4 prefixes in the RIB in the core. So we really want to minimize the amount of IPv4 we're doing in the core, right? So by doing this we don't need any IPv4 state in the core other than in the multicast routing table and forwarding table. Um so the trick is that we can do a BGP peering edge to edge where we learn the V4 sorry the V6 address of the router on the other side of the edge, on the other edge like the ingress if you like. And then we can use an RPF vector to say to the core routers that we want to RPF on that V6 address. So it's like how we do RPF vector normally. So, yeah. What do you think of this? Does it make sense? Is it something we want to do in this working group?

David Lamparter: David Lamparter. Let's go from easiest to hardest. Um, the easiest thing is the 5549 is superseded by 8950, and idr is working on making this a generic thing less than BGP. So things are definitely heading in this direction. Um, I can also say we already have this in our code base in a way because we need to handle the case of getting that result um but it doesn't actually do the PIM packets with the cross address family addresses in there, it just tries to find the V4 neighbor that matches the V6 neighbor just to handle the special case. So we were already running into this as a problem and I actually know where this is in the code base. Okay, and it's very annoying to not be able to do what you're describing here. Um, but I I would also like to note that um one of the hm most dangerous or most complicated thing I guess uh here is uh running into a situation where we're doing we're handling PIM over IPv6 but it might in the same packet contains things that apply to V4 multicast forwarding and V6 multicast forwarding um and that is something I would absolutely like to exclude here. Right.

Stig Venaas: Yeah, so I would say I want to stick to the rule that says that all sources and groups in the message need to be the same address family. Hmm. So that particular PIM join would only have V4 S and Gs in it.

David Lamparter: It hmm, that okay that would be sufficient. I'm not sure if it makes sense to make it even more explicit even having some option or something at the beginning of the packet to identify like which of the two things you're going for because otherwise you just need to scan through the packet and see if it's V4 or V6. Right. But as long as we don't mix things in the same packet, yeah that should be fine. So yeah, I'm a little curious how existing implementations would react and at least the ones I know they um they would check the address family and then we would see, oh it's the wrong family we'll just ignore the whole um whole thing. Um but there could be routers that do something strange or, yeah. So that's why I want the hello option. Um yeah and lastly I definitely think this is in scope and we should be looking at this and um people are already running into the limitation of not being able to use 8950 because their multicast breaks so I know because we had to add the workaround. So yeah, um...

Stig Venaas: Okay, thanks. Um yeah, so um yeah one challenge I know also is there's many implementations that do like separate processes for V4 and V6 and can't mix that. So it might be a little tricky implementation wise depending on implementation but but in a way yeah, I want to focus on the protocol and not the implementation here. Yeah.

Mike McBride: All right, so you're done?

Stig Venaas: Yeah, sure.

Mike McBride: Just just to back up a little bit. So is the motivation to avoid tunneling and using encoding instead? Are you worried about the overhead of using tunnels? Is that the whole idea? And you want to go with encoding V4 in V6 header?

Stig Venaas: Right, yeah. You could do tunneling, but uh but it's... you could do tunneling today. Yeah, so yeah, the idea is to avoid tunneling. Yeah and and also avoid having to deploy V4 addresses on on your core routers. Yeah.

Mike McBride: And David, that's why you've been looking at this is for that very same reason is to avoid the like the MTU overhead of a tunnel or is that the whole thing?

David Lamparter: Well, um so if you tunnel it, the tunnel is nec- out of necessity a unicast. So if people want to keep it as multicast they don't really have a better choice, especially considering like you could maybe do it with Bier or something, but that's not deployed enough in in the scenarios we've been looking at. Um so keep the traffic as V4 multicast, or as any type of multicast, but you don't have the addresses anymore because that's how you deploy the network nowadays and then how do you do it? That's the the combination of factors here.

Mike McBride: Okay.

Humen: Humen, Nokia. Sorry, maybe I didn't get one thing. So when the packet goes out, do you convert it from IPv4 multicast to IPv6 or it just goes out IPv4 all the way to the the join goes all the way to the source and the source sends back IPv4 stream?

Stig Venaas: Um yeah, you can say if you look at the state on one particular router, you only really care about the incoming interface and the outgoing interfaces. So um whether the neighbor you received the join from was a PIM6 neighbor or PIM4 neighbor doesn't really matter, you you just know that okay someone on this interface is interested in receiving the the multicast. So um so the forwarding table you you build is exactly the same regardless of whether how the join came. Or you can say it a different way is, it's exactly like a normal IPv4 PIM join except that the IP header is a V6 header. But all the processing and the state you build from it is the same as a regular IPv4 PIM join.

Humen: Okay, but when when you send the join, again I I didn't quite understand this thing, that's why I'm asking. When you send the join up into the transit routers, when they start looking at the IPv4 join, how is the V6 core forwards that toward the RP or the source?

Stig Venaas: Right, so um so if you look at an (S, G), um these core routers here they they have a V4 RIB so they they actually know that for a IPv4 packet, they know where to forward it.

Humen: It's a dual-core then? The the core routers are actually dual-core routers.

Stig Venaas: Um, they in in a sense that they have V4 prefixes. But they don't have any IPv4 addresses configured on the router interfaces.

Humen: They have V4 prefixes but they don't... Okay.

Stig Venaas: So you you basically know where to route a V4 unicast packet um but you don't have um but you're avoiding configuring V4 prefix sorry V4 addresses on your router.

Humen: So this is something that I'm guessing it's already in unicast that you can...

Stig Venaas: Yeah, so you can say today there's deployments using 5549 or the the newer number that I don't remember um and and they do this for unicast. So if you look at V4 unicast, um all you need to know is the MAC address um of the packet when you forward it, like who it goes to. And you can learn that from BGP and your V4 forwarding table. And we're basically doing the same with multicast.

Humen: Yeah okay so I get that if you have BGP you can learn all these stuff, but then what got me confused was that your BGP was from R1 to R5, it wasn't like R... R whatever...

Stig Venaas: Okay that's... so one way of deploying this is like um with a V4 RIB on on each of the core routers. And and that's what probably most people would do because it allows V4 unicast to work. But if you don't care about V4 unicast, one option is to do um what I have on on this very last slide. And and that's using RPF vector. So that basically tells the core routers that don't really use the source address in the in the you know the (S, G) for RPF lookup, right? It says to use this V6 address instead. If you do that, then these core routers don't need any V4 RIB. Um but they still can build an IPv4 mroute if you like or like a forwarding table because they know um which interfaces they received joins for that (S, G), and they also know which upstream interface they have your IPv6 neighbor that you sent the join to.

Humen: Okay. We do something like this but it's like a NAT type of thing that you grab the IPv4 (S, G) and you NAT it to the...

Stig Venaas: Right.

Humen: Well it's a NAT type of concept I don't want to call it NAT.

Stig Venaas: Sure, sure, you could say an alternative would be to um somehow embed all your V4 prefixes into V6 and then...

Humen: And I thought you guys did it too. Actually, I thought we were like looking at it and we somehow saw it in, are we allowed to say company names? Well, Cisco's documentation too.

Stig Venaas: I don't know, um but but yeah you could say if you use mapped addresses then you could say in your joins that this (S, G) they are address family IPv6 and the (S, G) you know they are V6 addresses. But then you get into I don't know it's a little tricky like well for one thing it's more natural to just use V4 addresses and and our current format allows it. Um but also you would have to say okay this is a mapped address so I need to convert it to IPv4 to do a RIB lookup in my IPv4 RIB. Or you can populate them in your V6 RIB but I I think it's cleaner and similar to 5549 to just think of them as V4 addresses rather than do some kind of embedding in V6 space.

Mike McBride: So there's a lot of solutions to do this and you're introducing a new one that you think is cleaner, is that basically what...?

Stig Venaas: For multicast there's no no solutions to do this. I see. Other than encapsulation. Yeah, yeah okay sure for native. But if you do encapsulation though then you need to you know how do you map your V4 group to maybe a V6 group if you want to do V6 multicast and I don't know and and you also of course have the overhead of encap and the MTU issues and yeah.

Mankamana Mishra: Mankamana, Cisco Systems. So to answer Humen's question, basically if you look at your join, end to end it is going to be V4 state. So there is no V6 state getting created. Only thing which we are using V6 is how you transport your V4 join from R4 to R3 here. So what we are showing between R1 and R5, you have 5549 or the newer RFC which tells you how to reach the V4 addresses. So it is pointing to R1 V6 address. Now you're using R1 V6 address to let your join go hop by hop, it's still create V4 states, but use V6 IP address. Otherwise your default PIM join will always try to look for V4 neighbor address to send your join. So your data plane and control plane end to end remains IPv4. There is no change there. Only thing is changing is how you're transporting those join messages.

Humen: I I think I kind of get it but I guess the part that I was scratching my head was that when when you send the join then those core routers they need to kind of understand IPv4, so I didn't understand the saving. Like you need to have the IPv4 in those core routers to send the join all the way from the R1 to the R5. Excuse me, you need to have some kind of IPv4 unicast forwarding knowledge, so it is a little bit like it becomes a little bit of dual core thingy. So I don't know.

David Lamparter: The answer is quite simple, you don't need the addresses anymore and addresses in IPv4 cost money these days, so the goal is to get rid of the addresses, not of the V4 capability in the router. People are just not giving the links IPv4 addresses anymore. In some cases the router has one address total in IPv4, and in extreme cases it has zero IPv4 addresses. That one doesn't quite work yet, the int-area people are working on extending like ICMP for that, but having only one IPv4 address on the entire router and then V6 on all the links is getting quite common now.

Stig Venaas: Yeah so one thing is having the addresses but also the management of you know, yeah, adding the addresses. It sounds sounds trivial but people want to simplify as much as possible.

Humen: Yeah I mean like to be honest with you like even with Tree-SID, uh we were kind of saying that well why can't you just have one IPv4 address for the for the root address because in the Tree-SID we say root is 32-bit so it's IPv4. So we were saying hey let's just have one IP V4 address, but then what we got as a comment back was that well if I'm going to IPv6 why would I want to have one single IPv4? So that that's the part that I'm kind of you know, kind of...

Stig Venaas: Right, so but yeah I would say the main idea is um we have 5549 that people are deploying for unicast and we want to do something well similar for for multicast. And and I believe it is is quite similar here, just like we are doing with BGP for 5549 is the the outer header is like an IPv6 address but the contents is all IPv4. And and that's what we are doing here as well. Um so we're kind of trying to make the minimal minimal amount of changes to to PIM to make this work and and the only thing we need is to have this hello option and to know that routers are can accept the more slightly more liberal use of mixing V4 and V6 than is currently in the RFC. So from protocol point of view, um it would be basically getting a hello option and um and documenting those minor changes or compared to what 7761 is expecting today. Okay.

Mike McBride: Do you want to call for adoption?

Stig Venaas: Um that's up to you. I can, I'll I'll do it. I'll start it. Okay, thanks. Um while we're waiting for people to respond, is this something that you've with what you can divulge, is this a customer request of yours or is this just something fun you've come up with?

Stig Venaas: No there's there's customers that are interested. Um yeah basically um yeah there are people that are using currently using 5549 or whatever is the newer RFCs today and and they are doing multicast but they have to configure V4 addresses just because of PIM.

Mike McBride: Okay. Thank you. So we have eight people so far that nine people and that are in favor of this and none against, so we'll take a the list. It looks like there's interest. All right, I'll I'll let it go for a little bit. Torlus you are up next. Yeah you're up next, yep. Okay.

Toerless Eckert: All right, here we go again. But after 40 years what's one IETF or two more? So um last after IETF 124, um we did put this draft in for early reviews with a bunch of directorates. SECDIR with Brian Wise, INTDIR with Pascal Thubert, IOTDIR Eric Nordmark, routing-dir Sandy Zang, so thanks a lot to all those directorate reviewers. Still waiting for Michael Tuxen for TSV. Um in general I think that worked out very well, so if if if you have a draft that needs more review input try to figure out what good usual suspects would be because I was going through the different directorates and trying to find names that actually did know what multicast meant and uh so so that's why I think that we also got very good feedback. I'm not going to go through the all the set of textual nitpicking improvements. Um there was one uh I I felt a little bit hilarious um thing with Eric's review that I think he he still thought IP meant IPv4 only, uh even though the document tried to emphasize on the fact that IP now in this document means IPv4 or IPv6, um and so I now added a sentence in blinking red in the beginning of the document to re-emphasize that again just because the IPv6 RFCs like 8200 and others are never using the term host group, right? So he was kind of afraid that that term was something only applicable to V4, so I also added an explanation why obviously it applies equally to V4 and V6 but you don't have to write the term host group if everything you're writing applies equally whether or not you're talking about ASM or SSM, right? So we're really getting into the details of of of the terminology which I think is a good thing because ultimately if you have a full standards-track thing and you don't exactly know what word to use then um it becomes kind of a problematic. Um IGMP version 1 backward compatibility was also something that had to be specified a lot more, which backward compatibility means any document that already mandates V2 or V3 when that already says V1 that's fine. If anybody else mentions the word IGMPv1 that's out of, you know, commission so we don't want to see that anymore. Um the other terminology thing was the SSM channel address that also is confusing because you never know what it means. Does it mean the combination of the source and the group or just the group address? Right? So the real term is SSM destination address, so um that's now detailed in a new um terminology subsection. Right and then um from the IPv6 documents the terminology coming that if what you're doing applies equally to ASM and SSM you can just talk about multicast addresses, but if you have that something specific to ASM or SSM you need to talk about host groups for ASM or you talk about S,G channel address destination address for SSM so people know which of these two options you're talking about. Um and yeah so requirement really um if you're writing something about multicast you should also say in your document if it does apply equally to ASM and SSM or not, right? So that's I think an ongoing issue that we have that most people ignore SSM um which obviously we do not want to have so at least they should explicitly say that um and that's hopefully something we can make sure happens in future reviews consistently. Um yeah there was another new interesting registry created by IANA in the past half a year so I stumbled across that, added that. Um and then Zandy was bringing up the question about the applicability of the new um addressing we're working on. So there is this Appendix A2 which I hadn't touched, um where Steve was 40 years ago talking about how to make dynamic multicast address assignments. None of his text was ever adopted, we had a lot of other stuff that was adopted into RFC and never happened. Um so I basically added another subsection to this um and put three paragraphs there, right? One about RFC 6308 which I think is the best summary of all the good stuff we tried and didn't work. Um and then mentioning about SAP and SDP because that was really the best working dynamic address assignment we'd ever done, that's been working perfectly fine for for 10 years. And now, um and that's the third paragraph we've been starting to do a new one which is the ZeroConf MAC address and the PIM GAAP, right? So so this is basically... well there was the door open with this appendix existing, so that that gave the opportunity to to put some current information into that. Um and then the other appendix which was difficult was this Appendix A3 which I had which after Eric's being confused and me being confused is now hopefully lot better. So you see we're only in the fringes of the document, nothing about the real specification text obviously. So one is the link-local address consideration and the other's for the multicast router. And that is basically the one thing I'm still pondering one or two things I want to get right um in one more revision. Um so what if an IP multicast router also has an IP multicast host stack? Right? So this RFC is really only for the host stack. Um do we feel and if so how does this apply to a router, right? So all the implementations that I know, yeah so in the 90s they started out with not wanting to know what a host stack is and but then the latest with something like Linux in 2008 or 9 or so they started to have just the normal host stack but also the router stack, right? And if you're building an application that uses some link-local multicast like an OSPF or any other routing protocol V4 or V6 you should typically just have a host stack, so I very much think that and this is what it writes currently, if you're a multicast router you should also have a host stack but obviously not all the requirements that are mandatory for non-routers will equally apply. Um so for example um there is the link-local addresses which don't really ever apply to routers, right? Because they're not routed, which is why there is this Appendix A3. Um and there is also the big point that you don't really necessarily have to use IGMP or MLD to get all the multicast traffic you ever want if you are a multicast router, even in the face of an IGMP or MLD snooping switch you're always expected to get all the traffic because IGMP and MLD snooping cannot constrain traffic to you anyhow. Um because you know if you're a PIM router uh they could at best talk PIM snooping but that's not the scope here. Um so so that's a little bit the tricky stuff. And um so what I'm trying to achieve obviously is to make sure um what are the best requirements we can raise so that everything that is actually well-working today is still in scope, right? Um without um making anything that works well um be contradicted by what the text is saying, right? So right now it's it's fairly lightweight and says you should have a host stack but you may optimize and leave away requirements like for example IGMP or MLD but that's what I want to rethink a little bit more to be very much on the safe side because then came the last review from Mr. Pascal Thubert and so there is this whole other world called IoT and they're doing a lot of cool and crazy things that that I haven't been tracking. So one of the things is they have something you could consider to be a replacement for MLD, um which is RFC 9685, so you see it's it's fairly recent. Um and I still have to think about what that is, if if that really should be considered an alternative to MLD in which case it might be, you know, mentioned, or if it's actually rather a replacement or an alternative routing protocol to Ripple, uh sorry, to PIM. And so so that's basically the the one question I'm pondering and trying to find some other people to talk about that. Because right now we have obviously the MUST do IGMP MLD and I just want to make sure that that is still valid in the cases when, you know, for example you have a Ripple host which just does RFC 9685. So um that's why I'm asking for a little bit more time to come up with with another ref for that. Um yeah there was another early review from IANA arriving on Friday, so I'll I'll need to do some fixes on that, but yeah that's I think pretty much it. Um yep, infamous last words just one more thing and then we're ready to go I hope. But uh you know I hope we're we're getting really the quality up from 99 to 99.5 percent. Perfect. So thank you very much.

Stig Venaas: Yeah, I'm not too serious about this, but there is something new called sub-link-local scope or something now.

Toerless Eckert: Oh, okay. Well bring it up so before I forget it because... yeah I mean the whole point is just to have the best possible text that makes everything that works well within scope and does embody that as long as it's within scope, right? It's the host stack, right? But whatever we can do to make it best I think... there is not going to be another one in the next 40 years.

Stig Venaas: Yeah so thanks a lot you've done a lot of work on this. And um yeah please have a look at the last version if you're interested. I should also say that we we're trying to do directorate reviews a bit more. It was a good idea to do this in for some the multicast discovery drafts we're also doing like early directorate reviews now.

Toerless Eckert: Yeah you need to find people, cuz you need to go through the directorate list and find the usual suspect names that might actually be good about it because many of the directorates are usually just doing a round robin. Sure. So that... which has the great benefit that people have no clue about it and can really much bitch from the ground on, right? Like I don't understand what you're writing at all. So that's a benefit but the downside of course is you're not getting an in-depth review on the technology if you only do that.

Stig Venaas: Yeah. Okay. Good afternoon everybody. I'm this is Sunny Zhang from ZTE and this time I'd like to introduce a new multicast use case for LLM synchronization. and I presented this draft on behalf of our co-authors Yisong and Jingye. So this the LLM means Large Language Model. We know that there are emerging inference cloud services here. So from this services it can deliver large scale real-time inference fine tuning and model optimization services on GPU cloud platforms. And we know that the GPU cloud platforms may spread everywhere, so there is multi-cloud LLM synchronization. So the centralized model repositories can automatically replicate and sync the LLM to the GPU to the GPU clouds spreading many regions or different carrier networks. So it's a new services now. And we know that there is some challenge for this services. So the first one is high concurrency. We know that a popular large model with it can be delivered to many dozens of GPU clouds simultaneously. and the size of the model can be 70 GB to 1 TB. And because we know there's so large model it can lead to IO bottlenecks at the storage repository, delaying model distribution at scale. So there's the second challenge here, it's cold start latency. Because the inference services cannot start until the model is fully downloaded to the GPU cloud. So if the if the download efficiency is low, so the efficiency of the downloading will be affected very much. and the cold start latency will be very long, so the user the user will be impact very much. Though this service is separate from the LLM training and inference process, but this synchronization process directly affects the efficiency and the reliability of inference service delivery. So.

Stig Venaas: Can I ask a question? Yeah. I'm just wondering about the cold start. When does that happen? Is that like happening often or...?

Sunny Zhang: Yes. That's means that when you must download the LLM and run the LLM, there is take times for the LLM to start to work. So if the model has download very slowly and you must wait for several hours or more or more to wait for the LLM to start to work.

Stig Venaas: Okay.

Sunny Zhang: It's not very good. Yeah. So because we know that the synchronization is very typically a multicast use case, but only unicast is used for now. So we know that there is some emergency IO bottlenecks and the cold start delay. So if we use multicast for the LLM synchronization, so we know that we can reduce the IO bottlenecks from the simultaneous downloads, so we can improve the transmission efficiency and minimizes the cold start latency. And we know that the GPU clouds may span multiple regions and operators, so we must use the multicast technology for operating and capable for across core and metro networks. Yeah. So this is some analysis for the candidate multicast technologies for now. So the first one is the most famous PIM-SM. So we know that it requires a multicast tree to be established in advance, and all nodes needs to maintain the state information, and it may be slow to respond to the network topology changes. So we know we think that it can be suit for scenarios where the set of destination GPU clouds is relatively fixed. And the second candidate multicast technology is SR-P2MP. We know that it relies on a controller to implement multicast traffic engineering, so we can use some better path for the traffic downloading, but in the replicating nodes it still require state. So we can build a multicast tunnel beforehand by controller. and also it's slow to respond to the network topology changes. So we think that it's also suitable for the scenarios where the set of destination GPU clouds is relatively fixed. And the third multicast technology is Bier. We know that it's a stateless multicast technology and no need to establish a multicast tree in advance. So we know that and we know that it responds quickly to the network topology changes. So it we think that it can be suitable for the GPU clouds is very flexible and... Yes.

Humen: Humen Nokia. Sorry, I'm just wondering why why do you say SR point-to-multipoint is a slow to respond to network topology changes?

Sunny Zhang: Uh, I I just we just think that it's compare to Bier, it may take some times to be convergence. because if the replication replicating nodes changes, you must download the new state in the new nodes for it, right?

Humen: Yeah, that's not completely true because the SR point-to-multipoint can use the unicast LFA and all these other stuff so... Oh, yes. if anything I I would say it's in line with BIER or PIM. Oh, just for correction.

Sunny Zhang: Oh yes. Yes. We can add some description in the draft to state that if we can build some FRR tunnels or unicast tunnels for it, it can convergence very fast. right, yeah. Okay.

Sunny Zhang: Yeah.

Jeff Tantsura: Jeff Tantsura. quick comment to that. I think about seven eight years ago we've published RFC which is LFA for MLDP.

Sunny Zhang: MLDP? Oh yeah.

Jeff Tantsura: You could reuse exactly as is doing IP fast reroute here, just have a look.

Sunny Zhang: Yes, we can add some statement for the MLDP and RSVP-TE P2MP. Oh if if it would be...

Jeff Tantsura: I mean, there's pretty much dead technologies by now, but the idea is exactly same, how you compute unicast fast reroute for multipoint services.

Sunny Zhang: Okay, okay. That's good. Yeah.

Humen: Yeah, and that's great point. I mean that refreshed my memory like we we use MoFRR too. Yeah, so all those of stuff.

Sunny Zhang: Okay, okay. We will add it in next version. Yeah. Yeah. So welcome comments and and suggestions, and we can discuss more detailed requirements or potential gaps here. Yeah.

Jeff Tantsura: A couple of comments here. So, I think by now it's pretty clear. and your draft is talking about MOE, right? The MOE decision is taken in about 5 10 microseconds.

Sunny Zhang: Uh that's because for now the work if we want to use the inference cloud services now, and you can wait for the LLM download. It may takes if if your network is very fast, it may takes you just half an hour, but if your network is not very good you can take it it can take you several hours and more to wait for the LLM to start to work.

Jeff Tantsura: It has nothing to do with network. There is a function called MOE router, which has nothing to do with routers basically. It's a function that decides which experts to use as part of the group, right? This is done in runtime and again and again in order of tens to hundreds microsecond, which leads to the point that tree has to be pre-established. You cannot afford to build tree post-decision.

Sunny Zhang: Um but I think you think I think maybe you think about the process of the inference process in LLMs, right? That's may take just microseconds here. But this is for the LLM the model downloading. So if you want to download the model, you may take several minutes.

Jeff Tantsura: If you want to download model you don't need multicast for that, with all due respect. The main use for multicast is to replicate traffic across MOE or experts to perform particular computation and return traffic, right? So the decision, so you have thousand GPUs. The decision which eight or 32 of them to use as experts is runtime. It's within microseconds. The point being you need multicast state pre-established. You cannot afford to signal multicast post decision. There's no time for that. You'll drop all the traffic. So you should really focus on how pre-establish multicast state before you actually going to send data plane over it. And Bier data plane is interesting from the perspective because it's stateless, you don't need any state, right? Classical multicast without extensions is completely unsuitable for that. So I'll I'll actually publish couple of drafts on extensions to both MLD and PIM how to do it. but practically that's the way to think about it. The multicast state must be completely disjoint from data plane, it must be pre-established. You don't have this normal multicast establishment flow, you send first packet, you punt it, you figure out if there's something, it must be done in advance to that.

Sunny Zhang: Yes, maybe. Uh but from for now we just use some static forwarding for it, right? but it's just a unicast be used for from for now. Yes. For the inference cloud services just unicast is used for now. So we'd like to introduce multicast technology for them to use it. And...

Jeff Tantsura: Absolutely, but unless you look into real use case it's kind of wavy, right? Because you're describing something that doesn't work. but I'm trying to explain is how it actually works and where to focus. Yeah. So the the first step we must persuade our customer to use multicast here, and then we can discuss more detail about how to deploy multicast along the world. yeah, along the pass.

Sunny Zhang: So you need to prove it's actually usable, right? In a way you present it here it's not.

Sunny Zhang: Uh for now multicast is not used for now. Yes. So we'd like to introduce multicast for. the synchronization, LLM synchronization.

Jeff Tantsura: We are going around. please listen to me and and I'll stop. Uh we all want to introduce multicast. It's absolutely efficient for this kind of distribution, right? What needs to be taken into consideration it's not traditional multicast where you have host signaling establishing the tree and then starting forwarding the traffic. By the time you're ready to build the forwarding state it's it's millennium too late, right? So you need to pre-establish this without any senders and receivers in network to begin with.

Sunny Zhang: So we think that if we really want to deploy it to the real exactly the network, we must consider for different network there may have different multi-technology multicast capability. Because some networks can only support PIM, so we can we may use PIM for them. and in some network it can deploy Bier, so we we can use Bier here right. And...

Jeff Tantsura: Bier is data plane, PIM is control plane. Don't mix them together.

Sunny Zhang: Uh yes, yes. So we can we must consider the network capability here. Yes. So we're not converging. I'll talk to you offline. Okay.

Mankamana Mishra: Mankamana Cisco Systems. So BIER being one of the option, how many bits we are expecting? So these traffic will go to how many such endpoints?

Sunny Zhang: The endpoints will be um dozens of GPU clouds for now. Yeah. Okay.

Toerless Eckert: Torless Eckert. Um so do you intend that to be a within a data center or across a more complex network?

Sunny Zhang: uh it's for build for WAN. Yes. It's for WAN. It's not just in a data center. Yeah. Sorry?

Toerless Eckert: within the same operator?

Sunny Zhang: because the GPU cloud may across many regions and even if countries. So it's it's it will not be limited in one data center.

Toerless Eckert: Okay, so more complex wide area network.

Sunny Zhang: Yeah.

Toerless Eckert: Then I think you um want to certainly look into the transport layer, the the fountain codecs which allow you to, you know, avoid retransmission and all that stuff, right? So we've been working on all of that for 20 years ago. Um and uh I think I think those things are all fairly well worked out. Um but if that is also being added then I think the whole thing would obviously just as an information thing be suited a lot better for MBONED than PIM, right? Because we're talking a lot more about a solution and applying kind of, you know, here are the choices in the network, here is the choices on the transport layer, right? Um and and yeah, so the fact that this is just to send uh what is it uh 100 gigabyte or something what it may be to a thousand receivers because that's your next ref of of of your data model or so. Um I think that that should be clearly set aside as the use case from I think the training stuff that you were talking about, right? This this is just, you know, wide area distribution of a finished model so that, you know, a lot of inference nodes can then use it. And and maybe up on the application layer the interesting question is can this be done incrementally? Right? So can it be done other than okay, the the data center that has calculated the model is ready and now we need to still wait for, you know, I don't know, one or two hours before um you know at the low speeds that wide area network links may have it can be sent, right? Can it be incrementally updated while it's being finished in the calculating data center? So I think that's where you would need to talk to um the training experts to figure out if something like that would be possible because that would certainly help a lot. Thank you.

Jeff Tantsura: Just one more comment on on top. you are looking into reliable multicast realiz, right?

Sunny Zhang: Sorry?

Jeff Tantsura: It must be reliable. The the inner payload of the packets is it's RDMA operation, right? So you have memory location on remote site and you are either writing or reading from there. You lose single packet, your entire memory block is corrupted.

Sunny Zhang: Yes. so we we think that it may be relative to the comments from Torless that we can use it with transport layer to make sure...

Jeff Tantsura: Yeah, yeah, it's really in addition to what Torless said.

Sunny Zhang: So...

Stig Venaas: Um yeah, Stig here as an individual. Um yeah so I think it makes sense to to well I guess Torless said it to split the two cases because the the cold start or what you call it, you can just find using existing multicast there's no latency issue right?

Sunny Zhang: and IO bottlenecks.

Stig Venaas: Yeah. but for the yeah for the inference yeah you really want to minimize the the latency so that's a very different problem. But yeah I think um yeah I think it makes sense to me at least to do it in MBONED maybe if it's more like this is what we can do with existing technologies right?

Sunny Zhang: Yes, we also we also present it in MBONED this time. Yeah.

Stig Venaas: But of course we should see if if we want to if we can come up with something new or better and then maybe PIM would be a good place but also... we think all multicast working group can be related for this topic right?

Mike McBride: Yeah my Mike McBride. My opinion is and Stig and I need to talk about this maybe with Jeff and others and Gunter is that this is probably larger than PIM and MBONED. I mean there's been a couple side meetings there's going to be you had one last year last year, there's another side meeting that Yisong's going to have tomorrow on LLMs and AI and multicast and all that. and it's being presented in RTGWG, right? Your draft.

Sunny Zhang: Uh...

Mike McBride: because your draft...

Sunny Zhang: Yeah.

Mike McBride: yeah. so I could be wrong but it seems like these types of multicast and AI related drafts are kind of right now being incubated in RTG RTGWG, is that right? In your are you your working group um handling or at least listening to these types of drafts and...

Jeff Tantsura: Yes, so as routing chair we do VR dispatch, right? from this perspective for routing drafts. So we are looking at all of them, that's why it's been presented there and uh I would say if it requires PIM IGMP extensions the work is here.

Mike McBride: Yeah, yeah.

Jeff Tantsura: and it does for all practical reasons. You need to be able to signal additional metadata.

Mike McBride: Yeah, yeah. exactly.

Jeff Tantsura: Uh total solution probably MBONED and as architecturally RTGWG is the right place.

Mike McBride: Yeah. so it so you got three working groups that we've just mentioned and who knows maybe in the future there's a new working group that deals with multicast and AI I don't know but it's it's good that you're add-

Sunny Zhang: let's have relationship for multicast AI. Yes.

Jeff Tantsura: No but you definitely don't want me to do PIM extensions in routing, right? It's it's not right place, it's what you do.

Mike McBride: Yeah, yeah, exactly. This is where the expertise is and all the right people, that's that's very true. Yep.

Jeff Tantsura: So similarly to I don't know segment routing where LSR extensions are done in LSR architecture done in routing and spring, potentially you'll have same split. I would think.

Mike McBride: Yeah, okay.

Toerless Eckert: But but again let's separate it out, right? So this is just, you know, mass distribution of of a huge data file. So we know how to do that, um and we've even done it in the past in in several cases. and so this is something ready to sell and, you know, what whatever the best underlying existing network Bier, PIM, you know any other options people want, it's more like the fact if you if you don't already have multicast deployed and somebody comes and asks how should I get multicast deployed? Right? Then then we give recommendations and then I think any of these extension and new stuff I think that's for um improving the training solutions, right? So...

Jeff Tantsura: MOE is extensively used in inference as well, that's the technology of of today at least. And the amount of data distributed is not actually not that large, it's in gigabits gigabytes at most, right? It's reasonably small amount of data you distribute, you get the results you do all reduce and so forth. So the interesting consideration for inference is that the latency is much more critical than for training. Right? For training you train offline pretty much. You might do it longer, it costs more, but it's not critical. In inference if it takes longer, end user will go to different provider, so you lose money immediately. So you should think also about latency and similar.

Sunny Zhang: Yes. So because this service for in this services the LLM is training in the repositories because there is a central central repository for the training and after the training the model need to be distributed for the GPUs and every GPU cloud in the every GPU can run the model and do many more works, inference works. Yes. So for the position of this draft, we know that if it's only a use case we think that it can be put in it maybe better it may be better to put it in RTGWG because many people's focus on RTW and they can get more information in that working group. but if we have some extensions for PIM or Bier or something else and we can do the specific extensions in the specific working group. Right. Yes. but for now we just want to introduce this multi-use case...

Jeff Tantsura: Sunny, you need to progress. You keep chewing on the same stuff for the third IETF, there's no difference between versions of the draft, right? You really need to progress or or stop.

Sunny Zhang: So may I have a question about RTGWG is this work in RTGWG charter?

Jeff Tantsura: You haven't defined anything that's PIM specific, right? So...

Sunny Zhang: No, in this draft there's no...

Jeff Tantsura: That's my point. so you it's obviously should be presented here because it involves multicast, but practice there is nothing PIM specific. so I would question whether this right place. but again independently of which working group you are in, if you keep presenting same draft eventually you know...

Sunny Zhang: We just want to bring more people to work on this, because it's a emerging service here. Yeah. And it it will be used widely, very widely. Yeah. So... but that's okay. Thank you. Yeah.

Mike McBride: All right, so Lenny, sorry it's so late, but uh yeah, go ahead and do that. I'm going to... how do we how do we accept that? Oh, here we go.

Lenny Giuliano: No problem. I'm still wide awake.

Mike McBride: Okay. I think you have control right now.

Lenny Giuliano: Um. How do I... You gave me something.

Mike McBride: I did.

Lenny Giuliano: I did. Uh I'm not doing I want a screen share, not slide manage. Am I screen sharing?

Mike McBride: No.

Lenny Giuliano: Yeah, I took I'm going to take it back and then I'm going to um send it to you again. So let's pass slide control is to you. See if it's...

Lenny Giuliano: Yeah, I don't want slide control, I'd like screen control, screen share. Yeah but I don't see any other options. I can I can share screen, but...

Stig Venaas: Can't can't he just select share screen and then you have to give permission?

Mike McBride: Uh Lenny, can you try to select share screen?

Lenny Giuliano: Uh say that again?

Mike McBride: Are you trying to select share screen or...

Lenny Giuliano: I don't see anything. Am I supposed to be seeing something? I requested screen share.

Mike McBride: Oh you did. Okay.

Lenny Giuliano: I think. Here, let me stop. let me rerequest. Okay, there. Oh wait.

Mike McBride: Yeah so we see that you requested it but um hmm.

Lenny Giuliano: Yeah, how do you grant me the ability to...

Stig Venaas: It's not working?

Mike McBride: But do you only have slides or do you have some other content or...?

Lenny Giuliano: I have slides, but they're PowerPoint slides because there's a lot of animation.

Mike McBride: Okay so the PDF PDF will kill the animation. In the chat someone from Meetecho just said what to do.

Lenny Giuliano: Okay, hold on. Let me look at that chat. Uh there's a button... Okay. Uh yeah okay I think we found it here. Oh wait. Ah there we go. Wait...

Mike McBride: Uh what did you have to do?

Lenny Giuliano: I had to close Sandy's screen. Okay. What do you see right now?

Mike McBride: We see it, just go to full screen, yeah you're good.

Lenny Giuliano: Okay. All right, cool. All right, so I'm going to be um presenting this proposal on behalf of my co-authors um on dynamic internet multicast tunneling. Um so as we all know better than anyone, multicast requires every layer 3 hop between sources and receivers to be multicast-enabled. Um okay I'll just pause is it working right now? Do you see...?

Mike McBride: Yes.

Lenny Giuliano: Okay. All right, um so to overcome this hurdle tunnels have been traditionally used. Um frequently they've been static tunnels like GRE, was most common. Um the issue with GRE is it requires manual configuration on both ends. So you need coordination and literally manually configuring both ends. Plus you need the run routing protocols through those tunnels so that RPF works. Um there do exist dynamic tunnels, specifically one example is AMT, um which are essentially zero config, the the challenge with AMT is it doesn't allow routing protocols to traverse. the only protocol that really traverses an AMT tunnel is IGMP. Um so what that means is that RPF is not doable or possible, so it doesn't work unless there's only really one relay in the world. Um and the use case is content providers and CDNs would like zero config tunnels. Uh so dynamic zero config tunnels, they like that part about AMT um in the middle mile um where there could be many different relays. And the question is you know, how how if you want a router to function as an AMT gateway, how do you do that if you don't know who the relay is? Um and and you don't have routing information as to how to get to that relay. or what relay is reachable for is is the correct reachability to the source? Um so specifically, how do they know which relays can reach which sources? I'm going to show a picture in a moment that kind of illustrates the problem we're trying to solve. Um so the solution here um is that we're that we're proposing is um in the BGP route to the source, uh we're going to add a BGP extended community, and that extended community will have embedded within it the AMT relay. Um and that relay, that is a relay that must have multicast connectivity, either tunneled or native, to the um to the source that is being advertised to the network for the source that is being advertised. Um and uh you know, AMT is you know the most obvious use case here for but it but it can also be used for um other dynamic tunneling mechanisms such as those that PIM-Light enables. All right, so just a very very brief refresher on how AMT works. Um so you have a multicast-enabled network, multicast-enabled content provider, multicast-enabled local provider, and um multicast flows natively, um you know the way God intended. Um but you know for the rest of the internet, the other 99 plus percent of the internet that's unicast only, you have an interested receiver who sends an IGMP report to a router that doesn't support multicast and nothing happens. Um unless um there's um an AMT gateway on this end client and the client magically discovers the relay, um builds builds an AMT tunnel, sends an IGMP report through that tunnel, um and the relay joins natively as if the gateway was directly connected and as there are more gateways they each get separate tunnels. Um and this this is the basis for the TridiN architecture, which is RFC 9706, which is basically, you know S-S-M plus A-M-T. You have the big-I internet that is mostly unicast only and you have these TridiN providers which are native networks, native multicast-enabled networks with if you have native sources with native receivers the traffic goes natively, and if you have off-net receivers, those are receivers sitting on unicast-only networks, um we have these AMT relays, the traffic goes natively to the relay, and it gets tunneled via AMT to each of the receiver. All right, so that's how the world works today um with AMT and TridiN. Um what we're proposing here is an extension to this, to support dynamic middle-mile tunnels. Um so we'll go back to that big-I internet that's you know unicast only. Uh and we have three multicast islandsLenny Giuliano: Island 1, 2, and 3. These are... these are islands of multicast-enabled networks that are not directly connected and they are separated by a unicast-only abyss. Um, so imagine you have sources in multicast island number 1, there's source 1, and multicast island number 2, you have source 2. Um, and in multicast island 3, you have a native receiver, you also have a few off-net receivers that are close to that multicast island. So um, we're going to add some AMT relays and an AMT gateway. And the... imagine these interested receivers all send IGMP reports to their directly connected router or nearest AMT relay. Um, now eventually these the question is, how do you how do these routers then know which relay to use for which source? So eventually, maybe this traffic gets to a gateway and the gateway says, "Okay, how do I get to what relay should I use to get to source 1? What relay should I use to get to source 2?" And the answer is, um, the relay will be embedded in a BGP extended community um and that is in the in the BGP route to the source. Um, so we can see that the ASBR in multicast island 1, when it advertises the route to source 1, adds an extended community on there that specifies the relay that has multicast connectivity to it. Um, and likewise, the ASBR in multicast island 2 does the same. Uh, and these routes propagate across the network so that all of these routers now know if I want to get to source 1, relay 1 is a is one way to get there. Um, so I can decide as a router if I have if my next hop is PIM-enabled, is multicast-enabled, I will send a say PIM join, um but if it isn't and I happen to be an AMT gateway, I can send an I can build an AMT tunnel and send an IGMP report to that relay. Um, and that's what happens here. Uh, so we see um, these these red arrows are PIM joins, which then turn into AMT uh, after the AMT tunnels are built from gateway 1 to the respective relays for the respective sources. Um, and then here's the data flow. It's native through the the multicast islands and then via AMT to the both the off-net receivers as well as the gateway in the middle mile. And what we're specifying here, what we're describing here, what we're proposing here is these middle-mile tunnels between relay 1, relay 2, and gateway 1. Um, can be AMT, could also be used with PIM-Light.

Um, so what are the implications of this approach? Um, this is a flexible architecture that allows core routers to become AMT gateways. Um, previously routers could be gateways, but again, how do they know which relay to use for which source? Um, it it kind of only works in a world with one one relay. When you have more than one relay, how do you know which one to use? Um, so this, you know, enables routers to figure out which relay to use. Um, it also allows routers to be both AMT relays and gateways, so they could be receiving an they could be a relay downstream for downstream end clients and then they can be a gateway to join the traffic upstream. Um, and it, you know, essentially extends the TridiN architecture to support middle-mile tunneling, because TridiN previously only really considered or covered last-mile tunneling.

What's the use case? Why would somebody want to do this? Um, again, content providers and CDNs want to be... there there are content providers and CDNs out there that want to originate multicast content, originate and trans- and transport multicast content that can be received by multicast islands downstream anywhere on the internet that are not directly connected. Um, and you know you don't have a direct PIM peering, and you'd like to have a zero config tunnel. Um, you know, GRE could solve this problem, but GRE requires you to configure statically the tunnel source-destination on both sides. And um, you know, there are CDNs out there that have said, "We don't want to do that. We would like to just have this relay out there, we tell the world here's the relay, if you want this content come you know use this relay," and we can you know the traffic can be sent to downstream networks over those tunnels.

Um, so um we're presenting this in MBONED. Um, we we think MBONED is is probably the right working group for this, um because this is really just an AMT relay discovery mechanism, um and that's kind of where AMT you know relay discovery mechanisms have traditionally lived. Um but uh that said, we'd like to seek feedback from um this working group and you know make this working group aware of this work and we'd love to hear thoughts and ideas from this working group. And um, that's everything I had to share. Hopefully I wasn't on mute and you guys are still there.

Stig Venaas: Um yeah, Stig here. I have a question. So um are you discovering the nearest relay or the relay that kind of gives the shortest path altogether?

Lenny Giuliano: Um you are discovering... what's the difference?

Stig Venaas: You might have you might have a relay really close to you, but the multicast path is kind of pretty far from the source to the relay.

Lenny Giuliano: So so yes. I should have mentioned this. Transit relays along the way can actually overwrite and add their own relay address in it. So let's say so this enables the ability to have like cascading tunnels. Um so let's say, you know, relay 1 is the one closest to the source and it could say, "I am the relay for this source," as it, you know, in the in the extended community that it advertises for this network. And then it gets propagated throughout the network. Um, a downstream network could say, "Well you know, I have a relay too," uh and you know I'm relay 2 and um I am going to set relay 2 in the extended community so that downstream networks, networks downstream of me could use my relay to get to it. Um, so this and this is where um you may have recalled there was a DRID RFC 8777, uh essentially used DNS to to to specify the relay and you did these DNS lookups. Um, that didn't allow you know more than one relay IP address for for a given group and it couldn't change. Uh, this would enable you to have the nearest relay to the receiver get used rather than the nearest relay to the source get used. Um, does that answer your question, Stig?

Stig Venaas: Right. Yes, so I think it makes sense usually to do the closest to the receiver, right? So that you maximize the... you want to do multicast as much as possible kind of to take advantage of multicast.

Lenny Giuliano: Correct.

Stig Venaas: So I think this kind of could allow you to do the shortest path to the source in a way. You can kind of choose how you want to you know, have your metrics or whatever, but... but I think closest to the receiver makes sense, yeah.

Lenny Giuliano: Yes. Yes, so that's what this is really doing. It is allowing you to allow... allowing a downstream AMT gateway to pick the nearest relay to the gateway, not the nearest relay to the source. Which is I think what you're saying.

Stig Venaas: Right. Yeah. Okay, thanks.

Lenny Giuliano: Sure. Um any other questions?

Mike McBride: I think that's it. Thank you, Lenny.

Lenny Giuliano: Great. Thank you guys.

Stig Venaas: All right, Humen.

Humen: I I thought they just forgot we are getting older in IETF and we need to take a larger step. Okay, so uh let me give you update about PIM in... sorry, BIER interop in the EANTC, uh what we've been doing for the past couple of years and uh where we are. So uh basically this year what we tried to do is we tried to bring up next generation MVPN with BIER uh between the different vendors, and that's what we tried last year too. And literally what we only use was the inclusive PMSI, so we didn't go into the S-PMSI, selective PMSI. Maybe something we can try it next year. Um but yeah there were some good news. So just to take you... I mean you can look at that later. Um so what happened was that we started with the next generation MVPN in 2025. So in 2025 what we did was only all the routers that you see here, they were BIER forwarding routers. None of them were edge routers. So the IXIA or the source, um it was pumping in BIER packets. Um so in 2024 we kind of accomplished that, you know, Juniper, Huawei and Nokia they can all forward the BIER packet and they can actually replicate the BIER packet. That was 2024. In 2025 we start looking at the next generation MVPN and the N is missing there because it's in white, so um it's not GMVPN, it's next generation MVPN. And we brought up this setup with Juniper back then and Huawei and this table kind of explains what we went through. So in 2025 we did have interop complete interop between the Nokia routers and the Huawei routers. The reason for that was both Huawei and Nokia were using the upstream assigned NG-MVPN label in the PTA. And as such they were using the BIER Next Protocol to be equal to 2. Um back then Juniper was using downstream assigned, and they were using BIER Next Protocol to be 1. So what was happening is that the signaling came up, we had the BGP completely talking to each other, but when we start sending traffic, Huawei and Nokia they were accepting traffic, they were getting rid of the BIER header and forwarding the traffic to the IXIA, so there was complete traffic between those two routers. but anything coming from Juniper to Nokia and Huawei was dropped because of the BIER Next Protocol, and anything from Nokia and Huawei going to Juniper was dropped again because of the BIER Next Protocol. So that Next Protocol is actually checked in the data path, from what we've been testing, on all three routers. So the data path when it looks at the BIER header, it sees that if the Next Protocol doesn't match what the signaling is doing, it just drops the packet. There is no lenient way of getting bypass because of that Next BIER Protocol. Another thing... another thing was that when we did this in 2025 we use the RFC 9573 which is the domain-wide common block between all three of them, so that's why the signaling was perfect. Um I think there's some part of this slide mis- the table is missing at the bottom. Um that's okay, I'm not going to go to it. So what we did in 2026, uh it was just HPE and Nokia. Um so this was the network that we were working in, uh we the overlay was again MP-BGP IPv4, the underlay was IS-IS. Um so obviously you need unicast routes to resolve the source, the multicast routes, and for the unicast we use VPRN over LDP. Um the reason we didn't use segment routing is uh it was short in time and we were having some issues with segment routing. So we decided to go with LDP which was the easier way to do it. And for MVPN we use again BIER I-PMSI and the address family was IPv4. And what happened in- so in this case the IXIAs they were pumping in traffic and they were doing IGMP joins uh from the receiver point of view and then on the other side where the source, the IXIA was doing PIM to actually send the traffic back through the NG-MVPN to to the receiver to the leaf PE. And both of those IXIAs they were receivers and they were source. So that means that the Nokia was the receiver, the root PE and the leaf PE and the HPE was actually root PE and the leaf PE too, so the traffic was bidirectional. So here here's what happened now. Um so Nokia is still supporting upstream assigned MVPN labels. We didn't switch that, we didn't go to downstream. Um Nokia still supports BIER Protocol Next Hop, but just for the fun of it, uh we put a configuration in our CLI that says that, "Hey, if you get BIER Next Protocol 1, process it. Don't- don't black-hole it. Make make sure that you process it even though you're upstream assigned." Uh and literally underneath the hood that means that the NG-MVPN label is context aware. And Juniper is not context aware... sorry, HPE is actually their NG-MVPN label is global- VRF specific. The whole thing was working. So um meaning that I set up that knob into the data path, I said, "Nokia accept Protocol 1, BIER Protocol 1. Nokia send BIER Protocol 1." And we were accepting traffic from Juniper, we were sending traffic to Juniper and BGP was working fine. So everything uh really start interoping. So as as of now where it stands is, you know, Nokia can work with HPE and Huawei, no problem. But this brings up a question that I think we need to answer again in the BIER which I keep scratching my head is, I really don't understand that BIER Next Protocol why we have 1 or 2 for upstream assigned or downstream assigned as this inter experiment kind of proved it, I guess... I shouldn't say proved it, kind of pinpointed it too that really on the data path it's just saying that it's MPLS which is the next protocol, it's a label that is the next protocol. This upstream assigned and downstream assigned and stuff really didn't make any difference in the signaling or on the data path itself. So now everything is rosy. Um seems like we have good Next Generation MVPN uh interop. Uh we can actually have Nokia routers, HPE routers, Huawei routers to be transit, to be PEs and it's all good. Uh everybody is now doing DCB. One thing that I just kind of want to mention here, um I don't know if there are Huawei guys. Uh we we had a bug. Uh we put the label in the PTA in the in the um the bottom 20 bits. where that PTA is 32-bit. where the RFC 6514 it actually says put it at the top 20 bits. So little-endian big-endian. Uh yeah something for other vendors to check too uh because I think other vendors might have the same type of problem. Yeah these are just some screenshots of what's going on. Yeah that's basically the story on BIER now going forward. Any questions, comments?

Jeffrey Zhang: Jeffrey Zhang. Thanks for the report. I think this the earlier question you had about the proto 1 and proto 2 thing we had this long discussion since we first discovered this problem. I guess the key confusion is upstream assigned versus downstream assigned. So in that BIER draft we the you me and Torless co-author, we just trying to move away from the term of upstream and downstream assigned. We just simply say that proto 1 versus proto 2. If you see a proto 1 packet, always look it up in the um in the global or downstream FIB um in the most efficient implementation, but an a particular implementation can choose to still use context label space tables, but as long as you have this labels uh programmed everywhere and then still treat it um appropriately. So from the standard point of view, I think we we were getting to the point where we can the proper document this uh in that draft and we can move forward with that.

Humen: Yeah I mean again my comment is that as long as Next BIER Protocol 1 doesn't specifically say that it is globally assigned, because then if we say that specifically that means then Nokia is not in par with the RFC. I'm not sure why we're talking about all these crazy BIER details now here in PIM, but I thought we'd already cleared that up um last IETF in in the in the BIER working group. So I I see no big issue with that.

Humen: Yeah let's talk about it in BIER. Fair fair fair. No I mean if at all in the larger scope of things, the thing that I'm missing is a little bit for things like DCB that require configuration that that we also at some point in time a YANG model, right? Because right now we're depending on every vendor coming up with their own different way on how to configure DCB. I haven't even found the right you know, example user documentation, so just from an IETF perspective I'd love to have the DCB configuration model, you know, in inside.

Humen: That's a very good point actually. So for DCB it's very funny because the way that we do it is to us it's just a static assigned VC label, right? service label, right? Um like I don't signal the fact that this is a DCB label, there's a bit I think. We don't signal anything, all we do is that which is a good idea. All we do is that we go in context of the next generation MVPN and we say that for the RT X uh use label Y. So anybody that is using RT X throughout the net- throughout the network uses label Y and we just advertise those even though we are upstream assigned but we are advertising the same label which we are saving label obviously.

Jeffrey Zhang: We're in that kind of continuum, right? So the whole context label was based on the minimum amount of stuff you need to configure explicitly. And then the simplicity of DCB is, "Well now you need to configure all these labels consistency," right? So more configuration, more flexibility, cheaper in the forwarding plane, more expensive in the forwarding plane. And without having at least the configuration there so that customers know, "Okay, I need to set up some controller that pushes down that DCB labels," it's hard for them to make an operational choice.

Humen: So I'm going to be honest with you. The reason I went through all these pains of putting so many knobs into our implementation to make it work with HPE and to make it work with Huawei was to make sure that there is a protocol that is working between so many vendors to show the industry that this technology is viable, right?

Jeffrey Zhang: No, I mean those are all perfectly fine. You were just I'm going back to your initial mumble about why the heck do we even have that distinction?

Humen: Mumble, is that what you think of my...

Jeffrey Zhang: In lack of a better...

Humen: Thanks, thanks man. Appreciate it.

Jeffrey Zhang: Right, so I was just trying to give an answer to that, right? One is more self-configuring, the other is cheaper in the forwarding plane requires more configuration. That's my answer. I'm going to stop mumbling now.

Jeffrey Zhang: Any other questions? Do appreciate you're going the extra mile to to for the interop. Um I have another question uh you probably mentioned it but I skip I I missed that one. Um so in the interop testing we use the I-PMSI. Um obviously to take full advantage of BIER, the S-PMSI would be better because that uh you would need more tunnels if you are not using BIER. So what was the reason for not using S-PMSI? It's just the test resources or whatever?

Humen: Time. we had to debug some stuff to bring it again. Um I think HPE has two forms of BIER configuration, one is tun- Don't quote me on this man, I don't know. You know this better than I do. I think there is one tunnel and one- so we had a little bit of back and forth to figure out which mode to use for BIER and I just didn't have the time. So S-PMSI can be next year for sure.

Jeffrey Zhang: Okay. Thank you. Yeah. Thanks.

Stig Venaas: Very cool. Thank you Humen. And that's a wrap. Thank you. We're done.