Markdown Version

Session Date/Time: 16 Mar 2026 08:30

Job Snijders: That we got it spelled correctly on this slide. And it is 4:30. May I ask that the people that enter the room close the door to avoid a little bit of the echoing from our friends in the hallway? Thank you kindly.

Welcome to IETF 125. Hopefully, this meeting will mean that we fix all the issues in BMP and BGP internet routing and no more meetings are required. Everybody, please be focused for the next minutes. My name is Job Snijders. I am your co-chair and next to me is Paolo Lucente, the other GROW chair. And in the back we have our secretary. Where? Oh, sorry, not in the back. You moved! Thank you for being here.

We have a pretty packed agenda today and I'm quite excited to go through all of it. But before that, I want to remind you of the Note Well. And the Note Well can be summarized as: please be excellent towards each other. When we engage in debate, be sure to focus your argumentation on the arguments and not on the persons. Thank you for that.

Here are a few resources. I will be monitoring the Zulip chat room. If you're participating in a session, please make sure that you scan the QR code that is hanging off the microphones. That way we know how many people are in the room and who is in the room, and this information can be used to plan future meetings. So if you haven't done so already, make sure to scan the QR code and log in with your Datatracker account.

As minute taker, we have our secretary, so that problem is solved. And then we will go on to our agenda for today. The agenda today has an extra item that was not communicated on the Datatracker uploaded agenda because it was a last-minute addition, but we received a request to talk about a problem where BGP and RPKI and internet routing sort of intersect, and maybe this working group can offer some perspective. But we'll first work through the planned presentations and then, time permitting, we have room for the last item. Are there any comments or suggestions on the agenda? Going once, going twice, the agenda is final.

With that, I would like to invite Thomas Graf to share with us an update on the BMP YANG model for network telemetry messages.

Thomas Graf: Thanks a lot, Job. So, it's actually an introduction to a new document. It's a document we are presenting now this IETF at GROW and also at NMOP, because it has to do with what we're doing with the so-called message broker integration. We did so far YANG push, and since here GROW is BMP, we found that it's important that we're also showing this document here and gathering feedback from you. Apologies, we will do some clicks to get your slides on the screen, because a few seconds...

Perfect. Let me try the clicker. Yes. So, at NMOP, we have currently multiple message broker documents which basically describe how YANG data collected through YANG push can be seamlessly integrated into a message broker, so like Apache Kafka, Apache Pulsar as examples. Because today we have network observability, network analytics applications who require network data, and with this integration, we are basically automating the data processing chain. We have a so-called telemetry message, which is a YANG schema defining the basic schema between the message broker producer and consumer. And we have a YANG push extension to the telemetry message. And in another document, we are describing basically a naming concept for topics—and I will explain later what topics means—and also how the data can be indexed efficiently in the message broker itself. And here, the document describes basically a BMP extension to the telemetry message and also how basically the BMP properties can be used in the topic naming and indexing.

To go a bit back where we are actually heading to: we want to get our network engineer away of doing show commands on a router and we want that they are performing SQL commands in a message broker. So, in order to achieve that—I mean, we know BMP, huh? We know how BMP works on the wire. Probably you're not so much aware about what message broker is. Just to give a few words: so, basically, topics is the channel where you put things on the wire and you can produce and consume. Subject is basically you can have like different kinds of schemas in a topic, so think like message types in BMP. And then you can have partitions. These are like when you think about QUIC, for instance, as a transport where you have several streams, so you're going to have several partitions. Message is obviously what we are sending over the wire between the producer and consumer on a topic. And then we have like keying on those messages and we can perform compaction on data or state compression.

Now, YANG has a data taxonomy, has a structure. There are YANG nodes, there is a schema tree, it's dimensional data. So, we have at GROW with draft-ietf-grow-bmp-yang a model where we can do the subscription on the network device and basically subscribe to BMP but also obtain operational statistics from the BMP process itself, not to be confused with BGP itself. And now once the subscription is done, what that means is basically the BMP data is arriving at the data collection and from there we do the schema registration towards the schema registry. And for all the different BMP message types, we are getting like dedicated schema. And on the right-hand side, you see the YANG schema tree. So, in other words, if we go back the slide and you see at the bottom the BMP information on the wire with the BGP PDUs being mirrored, we are doing a transformation into a YANG structure. So, at the end, when we are producing that message on the wire and on the other side on the data consumer, the data consumer doesn't have to be directly aware of how to decode BGP. That's part of the data collection and basically the data consumer can solely focus on either just consuming JSON messages on the wire or, if he's YANG-aware, can obtain the YANG information from the schema registry.

So, to talk a bit about: in BMP we have different message types and when you look at the document on the telemetry—on the message key document for the YANG push, we are proposing an addressing scheme to separate basically the different kinds of metrics into statistics, states, and state changes. And we are describing now in this BMP telemetry message protocol how this can be applied also to the BMP message types and to the BMP data. An example at the bottom: this is how basically the topic could be named for BMP. So, we have like the project, the environment, then if it's state statistics or state changes, and then local RIB, the kind of RIB, and then the type of basically messages we have at BMP.

Then the keys: so, basically how to index on the message broker. And this becomes very much relevant not only for the distribution among the different partitions, but also how messages can be compacted. Here we are taking the essentially the BGP per-peer header information—the keys out of—plus some in terms of route monitoring and route mirroring, also some BGP from the BGP PDU obtained information like address family, sub-address family, NLRI, or prefix, for instance.

Now, at the end, basically through a streaming catalog, basically an AI application or a user can discover which BMP information is available on the message broker. That information comes from the schema registry. So, in the YANG information, we have directly the schema, but also the semantics basically describing what kind of data we have on the message broker, so it's easy consumable. That's the overall picture where you see from the left, from the BMP subscription on having the BMP data on the wire, doing the transformation, register the schema until at the very end on the YANG data consumer.

There are several use cases at the end. Just pick one, for instance, a network controller. For us, it's quite important that basically not each application who requires data is obtaining that data directly from the network. We want to collect once and make that over a message broker available for multiple use cases. And one can be, for instance, a network controller.

Now, I have a couple of questions to NMOP and also to GROW. First of all, does that make sense that we are doing transformation at the data collection from BMP and in the future also from IPFIX data to YANG, make it consumable for JSON consumers, but also have one unified schema language describing the BMP matrix and the IPFIX matrix? Then, the second question: which working group? Shall it go into NMOP where we have already the message broker documents, or shall it go towards GROW where we are handling the BMP information? One thing I need also to remind here is that we are reusing a BGP model from IDR and we have later in the back also related information. There is one YANG module where we define basically the RIB structure, and that we would like to preserve in the document itself. These are the questions which we sent to the IDR working group and we got already some feedback from the authors. These are the next steps we like to perform on the document itself. And I see Prasad is in the queue.

Job Snijders: We have a few seconds for a question, so please be fast.

Prasad: Sure, thank you. Hey Thomas, thank you for sharing this. This is Prasad from Cisco. Maybe I'll just stick to my top two questions. I had four or five questions. I think from what I understood, you're saying all the translation happens on the collector, nothing on the router. So everything is happening on the consumer side, is it?

Thomas Graf: That's correct.

Prasad: And then I can understand why you would do this if it's native telemetry data. I find it interesting that you're proposing doing this translation for BMP also, which is already standardized. Like, have you—I mean, what are your thoughts on that? And then have you done any hackathons to measure the impact of this translation from BMP data to telemetry data, maybe? Sorry, the—what you're proposing here.

Thomas Graf: Sure. So, maybe to answer directly the second one: doing transformation from BMP to JSON and ingest in Kafka, we are doing that since years. So, this is nothing new. What is really new is basically having a YANG schema for it. And here we believe, I mean, IETF, especially the IDR working group and the IETF BGP YANG authors did a fantastic job in describing BGP in YANG. And if we want to make BGP data really easily consumable, we believe we should use that semantic information because I'm just coming from OPS area and there we saw from other network operators how important it is to preserve semantics from the network. Does that answer your questions?

Prasad: Yeah, thank you. I had a few more, but I'll follow up with you on mail. Thank you.

Thomas Graf: Sure, thanks.

Job Snijders: I'm afraid we're at the end of the slot for this particular presentation. So I would encourage people to contemplate the questions that Thomas Graf posed and reply on the mailing list. Thanks. Next up, we have RPKI requirements for monitoring RPKI-related processes on routers using BMP.

Shuhao Wang: Okay, hi, I'm Shuhao Wang from Zhongguancun Laboratory. So I'll be talking about requirements for monitoring RPKI-related processes on routers using BMP. For anyone who's not familiar with our requirements, I'll first take a recap. We have concluded all the RPKI-related activities on the routers into four stages. For the first two stages is that the policy level corresponds to how the router acquire RPKI data and how the RPKI-related policy is configured. And for the last two stages, it corresponds to how each individual route is handled as how the individual route is validated and what are the effects of the validation results that has on the routing actions. And all of these stages are not supported with current standard BMP.

So, today I'll be talking about key changes from the last version. So, the key changes in the stage one is we extended from RTR-only to multiple RPKI data sources. And for stage two, we have introduced the route features to describe large-scale validation rule sets. And for stage three, we have a more enhanced validation structure to include not only the per-rule validation state but we introduce an overall validation state. And for stage four, we propose a dedicated RPKI impact message. And all these changes, we got all these changes inspired from a real router investigation of Huawei Route Engine 8000 router.

So, for the stage one, the key finding is that we found that the RPKI data could come from not only RTR connections but also BGP connections and static configurations. So to describe all these diverse sources, there should be common parameters to describe the sources and also some source-specific parameters to report each source. And for stage two, we found originally we proposed to transfer all the rule sets on the router through BMP, but we found it could be huge. So in this version, we instead report the route feature to through BMP and the administrator on the controller side can just combine the features and per-route info to reconstruct the RPKI rules on the router. And for stage three, we found that the validation type could be multiple, not only a range but also path, region. So there should be an overall state to describe all the validations combined. The overall state is valid only when all the matched rules are valid, and invalid when at least one matched rule is invalid.

And here I want mainly address two comments. For the first comment, the comment suggest that the RTR protocol state in BMP may face the scope creep since the YANG modeling and streaming telemetry already covers RTR. We fully agree with the idea, but we want to emphasize that this requirement is focused on the effect of RPKI data on routing decisions, not the operational details of RTR protocol itself. So, and also we in this draft, we have already shifted from monitoring RTR connections to monitoring RPKI data sources in a more abstract framing. And there is also ongoing discussions around this comment in the mailing list. And where suggested to consider CCR hash as a compact database version identifier, and we will consider this in our future revision. And the second comment is to suggest us to consider monitoring SLURM, which locally overrides RPKI validation rules. Actually, in this draft it is partially covered by static configuration source in stage one, but SLURM-specific monitoring, which means tracking which rules are locally overridden, is indeed a viable addition and we plan to discuss this for the next revision.

We haven't done this yet, but it'll be our later revision that we found the current draft proposed too many message types and we're going to simplify this in the later revision. So, to wrap up, we have listed all the discussion points, especially the message type consolidation, the BMP boundary. And for the next step, we're going to incorporate the feedbacks and investigate into more vendors other than Huawei routers and integrate with more emerging RPKI mechanism. Okay, that's all. Thank you. Any comments or questions? Thank you for your time.

Job Snijders: Next up is Mr. Paolo Lucente, who is self-servicing the clicker. Nice.

Paolo Lucente: All right. So, first of the two presentations about the TLV draft, right? So, first things first, we have two new co-authors, which is Maxence and Pierre. They were doing the SNTS, I think, draft that got merged, swallowed into this one. And the draft was about timestamping, sequence number, and extended flags. And so, welcome to the two new co-authors. And then another thing already for this slide is that you can see that the title changed a little bit, actually shortened, right? So before, the original scope was like introducing TLVs only for route monitoring messages and peer down. But then over time, I mean, we saw that, you know, the stats message, it was not formalized that it could have TLVs at the end. And then we are introducing other TLVs that apply to all messages and things like that. So, to reflect the broadened scope, I shortened the title essentially.

This is the usual recap slide, so I will just skip it. What are the changes since Montreal? So first thing, I just said it, SNTS got merged into this document. Essentially, like a timestamp becomes mandatory, like at the moment in BMP version 3, I mean, you can have or not have a timestamp, which probably for a monitoring protocol is less than ideal, let's say. And then sequencing and extending the life of the flags, because, you know, the flags are in the per-peer header, they are very short, we are almost running out of flags. So we have new TLV where we can set extended flags. We formalized the support for trailing TLVs in the stats report message. I went for a container TLV for the stats where you have still the number inside and then you can have other TLVs like in all the other messages. There was some feedback from Prasad and Dhananjaya and I processed that.

So next steps and open questions: of course, we have a little bit still to smooth the integration, let's say, of the SNTS draft. For example, we have a timestamp in the per-peer header and now we have a timestamp TLV. So what do we do with that timestamp in the per-peer header? Shall we elect one timestamp to go there or not? Extended flags, for example, at the moment are in addition to the flags in the per-peer header, but of course you have different models: they can be in addition or in replacement too. So, I think we are not religious. I mean, we talked with the other co-authors. I mean, we would be really open to feedback about all of this. It would be nice when on the point of numbering of the TLVs that let's say we get at least for the TLVs that apply to all the messages like timestamp, sequence, and flags, that maybe they get the same code points across all the messages. So we have yet again: shall we go uniform same code point for all the messages or get the next available depending on the message type, right? Then we have the ordering. The ordering of the TLVs has been very much discussed. We have a "should" in the document that says that we should order the TLVs by code point, right, when an exporter is packing. Let's say this is creating a number of little issues like okay, which TLV then should become before should come before the other, right? Do I need flags before groups, before the BGP PDU and things like that? As a consumer, I was doing the hackathon last weekend and I understood—and this is my very personal position—that I don't care about ordering any sort of ordering or anything. I just get all the TLVs at the very beginning when I'm processing the message and then after that I cherry-pick what I need and I implement my logic. And then yeah, the IANA feedback was processed. It was very useful. That's it. So I don't know if there is any questions. Otherwise, I have my second presentation about the route event logging.

Job Snijders: Job Snijders, a question as working group participant. Have you considered the cost of ordering and the potential future applications of ordered data in terms of replicating the data versus not ordering? Because if ordering is cheap, it allows some future possibilities that if the data remains unordered are, yeah, you close some doors.

Paolo Lucente: Right. As a consumer, I just understood that for me it's irrelevant whether they are ordered or unordered. But from the exporter perspective, we understood that there is a preference for ordering.

Job Snijders: Yeah, okay. That makes sense. Thank you.

Paolo Lucente: All right. So, yeah, the REL draft, Route Event Logging. Just one word on the intuition: it's like with route monitoring, we have the synchronization of the RIB from a peer or from an exporter for all its peers configured peers to a BMP station. With mirroring, we can cover other use cases. We have peer up and peer down for session reporting. We have stats. What we are missing is something, you know, a message type that is event-driven for, you know, use cases like alerting, reporting, or to simplify the on-change analysis so that you don't have to do the differential one RIB—I mean, collect all the RIBs or several RIBs and then do a differential analysis, right? So there could be an exporter already telling you what is changing.

So, what are the changes since IETF 122? Because this draft actually has not been presented for the past two meetings. So first thing is that something that we had long time out there: like, shall we generalize or not this message type? Because before, we were tied to having a BGP PDU in this message type. Now that is optional. And so you can have routing events. So then you have the per-peer header and you know, the BGP PDU, or you can have—in this moment, we defined health events just because we got a use case for health events, right? I have an example later. So essentially now we have after you know, the BMP common header, we have an event type, and then depending on that, the structure can change. There has been quite some attention on the validation fail TLV, which in the end it's something that at the moment is carrying some RPKI information in there. Lancheng was proposing to have some reason codes for why things are failing, made sense. And then Feather from DT, this is something that will come in 06, which has not been posted yet. So essentially, Feather was saying at the in very short like why validation fail, right? So this thing should be validation state change, because maybe something transition from fail to valid and you want to track that state as well, right? So essentially it's a generalization. And then I added some wire format examples so to help people you know, parsing what's going on.

So, the example for routing. I mean, it's very you know, it's what we had before. We have the common header, then the event type header, which essentially says one routing. And then you have the per-peer header and the BGP PDU and any other TLVs. It's essentially what we got before. For the health, and this is a use case that was raised by Mukul, it's like, you know, we have the BMP common header, then the type event type is health event, let's say number two. And then you see we don't have BGP PDU, we don't have per-peer header. We have just the event reason, which is log action, and as part of more information of the log action, we have this this example that you know, Mukul proposed, which is essentially like you have an unstable peering, so it has been flapping and you have a five occurrences of that in some hundred seconds time, right? So yeah, so it made sense. So we are going in this generalized direction right now. But like the health example was just you know, an example. So the base question from Jeff from many meetings ago remains, which is: what this message type wants to do as a grown-up? We are generalizing it. Probably we want to collect more use cases. The presenter after me, they will talk about the REL enhanced. So for example, they would be great reviewers for this draft. They will say like whether for example it's fitting for their use case to fit their data in the current structure or whether they have any other suggestions. So that's it for me. I see we have Prasad in the queue.

Prasad: Hey Paolo, this is Prasad. I'm just thinking based on where this is heading and some of the use cases we got for the generic event notification draft, we also got a couple of use cases for health. Maybe we should we could discuss and see if we can merge both of them. Then this becomes even more generic across routes and something outside of routes also, like a neighbor event at a router level, you know whatever else we had maybe like five use cases in the generic event. So we can sort of define a structure for that structure. We already have a structure for routes and we can sort of look at how to merge all of this maybe. I can discuss that with you offline.

Paolo Lucente: Sure, sure, makes sense. Makes sense. Yeah, thank you.

Job Snijders: Any other questions or comments? Nobody in the queue. That's it. Thank you. Thank you, Paolo. Next up, I would like to invite Nan Geng about log more routing events in the BGP Monitoring Protocol.

Nan Geng: Good afternoon, everyone. I'm Nan Geng from Huawei Technology and this will introduce an enhancement for the BMP route event logging just introduced by Paolo and this draft is not to define a new type of routing event. We will define some new code points for the log action TLV.

Okay, first in the draft, we propose to log some routing events for BGP Flowspec. BGP Flowspec has been used for eliminating malicious traffic, traffic engineering or some other purposes, but there may be some failures on the control plane of Flowspec. So if we can use BMP to report some routing events of this protocol, it will be helpful for the operators to know what happened on the control plane of Flowspec. And in this document, in the current GROW BMP REL draft, log action TLV has been defined. And in the first byte of the TLV is the code point. The code point indicates what kind of routing events happened in the corresponding NLRI. And in this document, we propose four new code points for BGP Flowspec under log action TLV, and they are Redirect to VRF fail, Redirect to Next Hop fail, Redirect to SR Policy fail, and Validation fail. And with this information, we can know—help operators to know why the log was made in the protocol.

And second, we also propose to log some routing events for BGP SR policies. Similarly, we define a new code point under log action TLV in this document especially for BGP SR policies and that is Invalid Candidate Path. And the data part will contain a—can contain a string to add some additional information to explain why the log was made. And that's all. Thank you. Comments are welcome.

Job Snijders: Jeff, you may enter the queue. Sorry, you're in the queue. Please enable your microphone and video if you want.

Jeff Haas: Don't need for the video. Jeff Haas. So I've made this comment already once this IETF, so this is not specifically directed to you, but we have to be very cautious when we're using text strings as part of our protocol elements. The problem we have with UTF-8 is that it is a format that is challenging to parse. There's a lot of different rules about what to do about displayable characters. And much like I said to the TSV working group about this sort of thing, the question that should be getting asked is: what do you expect a receiving application to do with this information? And certainly for this use case for Flowspec, being able to get some debuggability about what is failing is useful. Ask yourself instead: is it better to have a structured data for this instead of a free-form text string?

Nan Geng: Okay, thank you, Jeff. Thanks for comments. About the use case, in for the Flowspec, we define two kinds of code points. One is for the redirection action failures and the other is the validation failures. For the first type of failures, there may be some reasons that the Flowspec redirection—I mean, for example the IPv4 or IPv6 next hop doesn't work as expected in the Flowspec control plane. And for the second type of failures, validation failures, for example the Flowspec may in the RFC of the Flowspec, it recommends that the NLRI should contain the destination IP addresses and the destination addresses in the FIB table should point to the origin AS. And if the validation failed, we can know that there should be a bad configuration on the Flowspec plane. Okay, that's my explanation.

Prasad: Just a quick comment from my side. This is Prasad. I think we just have to be careful about what kind of events we pick. We have to be careful especially with the ones which are very chatty, because anything route related, if it's next hop related, they can get very chatty if something is unstable. I think that's something we have to be just mindful of. Maybe it's not specific to the use case, but in general the direction we take for route events, I think that is something we have to be careful about. That's all. Thank you.

Nan Geng: Thanks for comment. Thank you.

Job Snijders: Thank you so much. Next up, Nan Geng with synchronizing BMP monitoring options and state.

Nan Geng: Okay, thank you. Nan Geng. I'll introduce another enhancement for BMP. And the enhancement is for RIB view synchronization and monitoring options notification.

In this document, we focus on mainly two problems. First, due to various reasons, there may exist inconsistencies between RIB views of BMP sender and collector. Here collector is the consumer. And existing BMP protocol doesn't support a non-disruptive method to solve the inconsistency problem. And second, there is no notification mechanism to inform of the collector about the updated monitoring reporting options. For example, when the sender stops monitoring the routing information of a specific address family without the notification mechanism, the collector will store the stale or invalid routing information of the corresponding address family. So, in this document, we propose two new BMP messages. One is the Route Refresh message and the other is the Monitoring Options message.

First, I'll introduce the BMP Route Refresh message. This is a newly defined BMP message type in this document. This is used to sync the RIB view between the sender and the collector. Following the common BMP header and the per-peer header is a Route Refresh PDU. The format of Route Refresh PDU follows the existing RFCs. And particularly, the Route Refresh message contains three fields. They are Address Family Identifier, Sub-type, and SAFI. The definitions of these fields also keep consistent with existing BGP Route Refresh messages. Here is an example of how to use BMP Route Refresh messages. The BMP sender can send the BMP BORR (Begin of the Route Refresh) to the collector, and the collector will mark the corresponding RIB view as stale or invalid, or just purge them directly. And then the sender will send the route monitoring messages to the collector, and the collector will update the corresponding RIB view accordingly. And then after syncing all the routing information to the collector, the sender will send a EORR (End of Route Refresh) message to the collector, and then the collector can finish the update of the RIB view and so that to make the consistency between the sender and collector.

This is—and second, we define another new BMP message, that is Monitoring Options message. We can just call it MO message. This is used to solve the second problem introduced at the beginning. This message is used to sync the monitoring options from the sender to the collector. Following the common BMP header and the per-peer header is the BMP Monitoring Options PDU. The format of the PDU has shown in the slide. And in the Type field, we define—we use the type field to indicate Adj-RIB-In, Adj-RIB-Out, and Local-RIB. And we use Sub-type to indicate Pre-policy and Post-policy. We and in the Flags field, we use one bit to indicate whether the monitoring options carried in the PDU is to be enabled or disabled. And after the Length field, there is a list of pairs of Address Families. So, a sender can use MO message to notify the collector that which address family—which kind of routing information of the specific address family will be monitored on the sender. Besides the three types of MO message, and we also define the fourth type, that is Statistics. And we can similarly we can use the MO message to inform of the collector that the specific statistics types will be monitored on the sender or will not be monitored on the sender.

Here are the examples. In this example, the sender on the left, the sender disabled the monitoring on IPv4 Multicast address family, and the sender can send the MO message to the collector, and then the collector will purge the all the routing information of the IPv4 Multicast address family. And in the right, this example shows that the sender enabled the monitoring on IPv4 Multicast address family, and the sender can also send a MO message to the collector, and then the collector will know that the sender will continuously send—report the RIB routing information of IPv4 Multicast address family in the next set step. Okay, thanks.

Job Snijders: There is one person in the queue. Prasad, please unmute your microphone.

Prasad: Sorry, I think I didn't unclick after last time, but it's okay. I'll make one comment. Maybe you should talk to Luke. Luke was looking at something similar. He had explored Route Refresh messages and he has a draft on this. I had pointed him to your draft. I can send you the notification. You mean Luke from NLnet Labs? Yeah, Luke H. I can send the email ID to the authors.

Nan Geng: Okay, thank you.

Job Snijders: Thank you so much. Next up, Changwang Lin for Extension for BMP Peer Header. We need to reset the timer.

Changwang Lin: This is Changwang Lin from H3C. This draft is about extension for BMP peer header. So first we can see the background. In the left figure, there are two routers, Router A and Router B, establish two parallel EBGP direct peers. And these peers are established based on IPv6 link-local address. This link-local address learned from ND protocol. These two parallel peers have the same peer address, peer AS number, and peer BGP ID. As shown for the current BMP report message described in the right figure, there's no additional interface information in the per-peer header. So for the BMP monitor stations, cannot correctly distinguish these two interface peers, BGP peers.

The solution is simple. For the BMP producer, for the common BGP peers, we have the identification is composed by peer address, peer AS, and peer BGP ID. For the BGP peers established by IPv4 unnumbered address or IPv6 link-local address, the identify is composed by interface index and peer address, peer AS number, and peer BGP ID. So for the BMP collector, the identification is the same with the BMP producer. So we need the interface index added in BMP report message. So in the BMP per-peer header, the peer index—peer interface index must be added to the per-peer header to distinguish these two parallel interface BGP peers.

Finally, we can see the protocol extension. In this draft, we define two new BMP peer types. First type is Global Interface Peer Type. Second type is RD Interface Peer Type. By using these two new BMP peer types, the BMP message must include the peer interface index in the per-peer header. So the BMP collectors can correctly distinguish the BGP interface peer by the additional peer interface index. That's all. Any question, comments are welcome. Thanks.

Job Snijders: I saw Maxime in the queue first. So, Pierre second, right? Am I holding it upside down? I am holding it upside down. Oh, Maxime dropped out. That's fine. Sorry, sorry. Pierre, go ahead.

Pierre: We have the same comment I think. Um, we were solving the same type of issues in a bit differently in BMP Loc-Peer, so we may want to synchronize on this. We are trying to solve the same problem as you in BMP Loc-Peer. It was not the goal of Loc-Peer, but we got feedback from it where we had to do similar things as you. So we may want to meet and discuss and see how we should do it together. Okay, we can discuss offline. Yeah, I think the feedback was coming from one of your colleagues actually. So yeah, the problem is real, the application is real, as we're trying to do this in Loc-Peer already, maybe we want to merge or well, do what whatever is needed in order to work together on this. Okay.

Changwang Lin: Yeah, thank you.

Jeff Haas: Jeff Haas. Very quick comment. A different way to solve this is to just simply make the peer address longer. So in SNMP as an example, you know, it's 20 octets long if it's link-local.

Job Snijders: Job as a working group participant. Jeff, I don't fully understand how making the address longer—can you clarify?

Jeff Haas: Uh, yeah, sure. The headache we're dealing with here is when you're trying to actually deal with multiple parallel peering sessions across different link-locals, you need the distinguish—the link-local sessions. You need the IF index in there somewhere. So rather than actually tuck it into the packet, you know, and it's exactly—it's almost exactly what we're seeing here, instead of it's basically a new code point. You can see in the diagram at the right-hand side, the peer address is followed by the interface index. Well, if it's just simply a really long peer address, it's the same thing in code maybe more compactly depending on exactly what the proposal finalizes as. It's a very small detail.

Job Snijders: Gotcha. Yeah, thank you. Next up, BMP Statistics Information TLV by Mukul Srivastava.

Mukul Srivastava: Yeah. Hello. Am I audible here?

Job Snijders: Yeah, we can hear you. Is it possible for you to move the microphone a little bit closer?

Mukul Srivastava: Yeah, this is the Mac microphone. Is it any better?

Job Snijders: Yeah, this is good. Just speak loud and clear.

Mukul Srivastava: Okay, thank you. Hello everyone. So, in the next 15 minutes, I would be presenting three drafts and in some way they are all related to BMP statistics. So the first one we have is BMP Statistics Informational TLV.

Okay, so let us start with the problem statement that this draft is trying to address. So most of the current implementation BMP statistics are usually reported on a periodic basis or triggered by specific events or threshold. When these stats reports are sent in the periodic fashion, most of them report the snapshot value and it does not capture the variation that may have occurred during the reporting interval. So network insights based on BMP statistics feed is thus limited by the granularity of the reporting interval, because if we want to capture churns which is happening in the network, we would have to adjust our reporting interval. So technically we have two orthogonal requirements to deal with: the first one is assume there is a analytics service which is consuming this BMP feed to get network insight, such service will require higher temporal resolution data, which necessitates shorter reporting interval, meaning higher reporting frequency from the BMP producer. On the other hand, the operational constraint favor longer reporting interval to lower the volume of statistics message on wire, which is lower reporting frequency. So these two are two orthogonal problems.

So, let's see the solution which this draft is proposing. So the solution here is to augment statistics report with distributional information over reporting interval. This would allow longer reporting interval while preserving the statistical insight required by the analytics application. So the diagram here captures both the problem and the solution. So on the left-hand side, if you see, we have a periodic reporting happening at fixed interval, and technically based on the network events, it is possible that on those fixed interval, the BMP producer might report same exact stats to the collector. However, the network might have several churns going on between those interval and it all depends what is our configured interval. So such thing provides a very limited insight and as I said before, if we want to capture those churns as well, we have to shorten our reporting interval. On the other hand, the proposal is to pass some additional information, such as min, max, median, and the average value during those intervals. This would allow us to have enhanced reporting with distributional statistics and allow longer reporting interval.

Okay, so the draft is proposing a Statistics Information TLV that can accompany the BMP statistics message to report additional information about the statistics. This is an optional TLV, and it can be used with any existing or future gauge type BMP statistics type. So this is the proposed format for Statistics Information TLV. So the proposal is to treat this TLV as any other counters which is getting reported today in BMP statistics message. This is to help fit in this TLV in the current BMP stats format. So this TLV follows the standard counter encoding format defined by for any statistics which is Stats Type, Stats Length, and Stats Data. Different counters encode data differently in this Stats Data field, so this TLV will also format in in a special way that we will see in the next slide. And since this is proposal is to treat this as a different as a same as any other counters, the Stats Count field in the BMP stats report would also count this TLV. But we have to note that this TLV might appear multiple times in the single statistics report message, once for each statistics type for which this supplementary information is being reported. So the Stats Type would be something new, which will make sure that this counter is not a real counter but a informational TLV format.

The Stats Data would be encoded in this format. We have a Reference Stats Type, so this is the BMP stats type for which this additional information is being reported. And then we have number of entries and the list of entries matching the number of entries. So the individual entries would be encoded like this: we have an Entry Type, which defines the type of statistical value reported in this entry, and then we have a value eight bytes and timestamp four bytes, time when this event was observed, expressed in seconds. And this timestamp is required when we have entry type 1, which is for minimum, and type 2, which is for maximum. It should not be present for snapshot type, average, and median type.

Mukul, sorry to interrupt, but you're over time for this presentation, so you are eating into the time for your next presentation. So please keep going. Yeah, sure. I'm almost done, but that's fine. We will adjust in our next presentation. Yeah, so these are the different types which is defined: Type 1 is for minimum value observed during the reporting interval, Type 2 is for maximum, Type 3 for the snapshot view, Type 4 for average, and Type 5 for median values. I think that's all. It concludes for this one. Would welcome review comments and adoption to the working group. I will pause for any questions.

Job Snijders: Your request for a to-do item for the chairs to consider scheduling adoption is noted.

Mukul Srivastava: I think we want to do some more work on this to make our minds on some items. This was an initial proposal.

Job Snijders: Yeah, so the request is noted. Any other questions? All right, next presentation.

Mukul Srivastava: Okay. So this presentation is about the Route Change Statistics Based on Policy. So let's start with the problem statement that this draft is trying to address. So we know that routing policies are widely deployed, and but there is no standardized way to monitor which specific attribute are being modified and how frequently these modification are happening. This makes limited operational insight to track route modification through specific policies, understand the policy impact, detect misconfiguration or route behavior change. Not all of them can be exactly verified through this the proposal which we have in this draft, but it provides some insight for all these use cases.

So the solution is to report BMP statistics to indicate which route attribute were modified by the routing policy. Stats are reported with address family information present. And since these statistics have BMP per-peer header in them, so it essentially tells for which specific peer this policy is being getting applied, it can be inferred. So this is the format for the statistics message for reporting the policy impact for various attribute. We have two bytes for AFI, one for SAFI, and then we have a number of attribute type which is included in this message, and then we have based on that number, we have a pair of attribute type followed by the 64-bit gauge. And these are the different attribute types today defined in this draft. I'm not going to read all of them, but they are mostly the BGP attributes and one type defined for each of them. And I think that's all. We'll welcome any review comments and again adoption to the working group.

Job Snijders: Yeah, open for questions.

Susan Hares: This is Sue Hares. In this hat, I'm IDR co-chair. Could you give me a little bit more detail on what you mean by modified by policy? Just curious, this is an interesting idea, but what do you mean? The attribute is modified when the packet comes in, you store it, and when you send it out it gets policy, or when it's sitting? Thank you, Mukul.

Mukul Srivastava: Yeah, so the proposal is when the stats report is reported by let's say the router during that interval, let's say if we are reporting a with some periodicity, on those specific interval, we report the delta which might have happened for a specific policy for a specific peer.

Susan Hares: Still not quite understood what you mean. There are two ways an attribute can be modified. I don't know if you're referring to the wire or if you're referring to its the information is sitting in the RIB and now you change it in the RIB and it hasn't gone out. I'm just looking for some clarity what change means. It's very interesting and very useful. I just look for a little bit more clarity in the definition.

Mukul Srivastava: Okay, shall we take it offline? Maybe I can respond you in more details over the email.

Susan Hares: Wonderful.

Job Snijders: All right, and now your third presentation, while Paolo is furiously clicking in the web interface.

Mukul Srivastava: Okay, this should be quick one. This is I update from our last presentation on EVPN statistics. So here this was presented in the last meeting and this is about EVPN specific BMP statistics. This is a quick recap. This was defined for all five RIB views and the counters they align to the EVPN YANG model. The updates which I was trying to share in this presentation is the new types and subtypes which we have defined from the last time, and per-EVI route statistics, so per-EVPN instance route statistics, and we had updated our formatting for that. So this is the format for for the route statistics for this is the global instance, and there is no change on this. This is just a recap. And again, these are the some new types which we have added for different EVPN stats. I'm not going to read through all of them, but these are the new types and subtypes which was added. And then we have update 2 is per-EVI EVPN route statistics and this format is exactly similar to our global instance, addition is for the Route Distinguisher which was added. And that's all.

Job Snijders: Any questions on EVPN specific BMP RIB stats? Pierre, was your question on Zulip about this presentation or the previous one? Oh, okay. Sorry. I see out-of-order delivery is always, yeah. Sue, are you again in the queue or still in the queue?

Susan Hares: I'm back.

Job Snijders: All right. Does anybody have questions? Okay, it seems there are no questions for now. Thank you so much, Mukul. Then our final topic, I would like to invite Ming-Quan Huang. I would like to invite Sue Hares representing this initiative.

Susan Hares: Ming-Quan, you want to come up with me? Sure. And uh, so sometimes when you get a mechanism in BGP—Ming-Quan and some other people presented a mechanism with BGP, and sometimes I look at a mechanism and say, "Okay, I don't really understand the problem." You know, I'm not going to build a mechanism inside of a protocol if I don't understand a problem. It's not very useful. I know it's an odd behavior sometimes. Is the clicker here? Yep. So they were describing it and so I'm introducing these guys so you can talk to them and give them some advice, because IDR received the mechanism and we said, "Hmm." So they were kind enough to meet with me this week. So here's the problem. And uh, Sheng-Gang is going to give you additional things, but I think it's maybe just to introduce, come up and talk to them and and work with it.

So one thing she's got into a problem is what happens if the RPKI ROV is wrong at the source? And one thing that got my heart was when—because I started out working operations and my boss—she said, "My boss holds me responsible if there's bad PKI data, because it's—there's two cases that this could happen. It could happen because it originates from my network." Some of the questions I asked her: "What do people use?" And that's what I'm asking you. What—come up and help them to talk about what's being used in the validation, what happens if you find it's bad, how do you work with the tools, how can it be quickly fixed, and is there a tradeoff between no bad data and bad data? So, be thinking here if you can give them some advice because you're all in operations in this area, or uh your people like me and uh Jeff who are interested in seeing operations work well.

The second thing is—and this I don't know if I just misinterpreted because I'm—I think sometimes my listening to uh uh English—our our Chinese English and American English is sometimes different—but you know, some places you get bad data from is from your leaf AS, where the leaf AS doesn't have as much resources. So uh, again, if you're getting it in from a leaf AS, uh either from people in different areas or customers, it's still got to be fixed at the origin if I understood what Job told me, and you might need some suggestions. So, um, Ming-Quan has saw this information and thought, "Well, we can fix that." Uh, and he looked at the type of information that false positives cause problems and that there's an impact of uh RPKI ROV's coverage and he gave three systemic recommendations. So there's a lot of data here and we and I'll let Ming-Quan go through the data in his research, but the real thing is: how do the rest of you solve this problem? I don't want to build something unless we understand the problems in BGP. Uh, maybe it's better for BMP to do it. Uh, this is really uh a call for help because um the details. Now Job said I'd have three minutes, I've done my three minutes. Are you interested to hear more or do you want to go and maybe those who can help Ming-Quan and Shen-Gang can come up? This is Ming-Quan's look. Did you just put yourself in the queue? Hey, that's cool. That's true chair love.

Job Snijders: Job Snijders, Fastly. Would you mind going back uh I think two slides? Now, if I got the problem right, I'm my expertise is not in yeah, and this this is great. I think it is um very important to look for a different word than bad data or false positives, because from my perspective as relying party, it was cryptographically valid.

Susan Hares: It is absolutely cryptographically valid.

Job Snijders: And that means it is beautiful data. And if you as a resource holder misconfigured the ROA, maybe your boss thinks it was a bad move, but from the outside perspective it is good data. It is not a false positive.

Susan Hares: Yes, so we need to come to terms that are working, but someone who is inputting it, as we discussed, put bad stuff into a good system, meaning the data coming from the outside got in there and it was not the right data.

Job Snijders: Right. We'd love a new term. I think that's right.

Job Snijders: And and uh in the RPKI, when we talk about the ROA configuration, it is an expression of the Certification Authority Intent, or CA intent. So the moment a ROA is configured with some origin AS and the origin AS was a typo, the CA intent was the typo. Now, how to prevent this? And I will put on my hat as operator former operator at Fastly and NTT. The the trick is to have all the IP resources in an IP Address Management system, and when ROAs are created, it is advisable to interact with the API provided by APNIC or CNNIC or whoever is the the vendor of the IP resources and confirm with the IPAM that the ROA configuration is what it should be. So in the Fastly system, the ROA creation was fully automated in the sense that the operator would request IP space from the system, the system would assign the IP space, and it would do so in a way that was error-free.

Susan Hares: What happens if some of the data you're pulling from those public sources is less than accurate? Not the ROA data, not the ROA data, but the data you might be pulling from APNIC, you might be pulling from other places.

Job Snijders: So the list the inventory of IP prefixes that an organization holds is not external data. That is something that the organization must manage themselves.

Susan Hares: Yes, I understand, but you just said you were pulling data from some place else and I I maybe I misunderstood.

Job Snijders: I I meant to say that you submit the request for ROA creation to the APNIC API.

Susan Hares: Okay, if you're sending it to APNIC, that's fine. So that's one answer. Any what if someone who is a customer of yours puts in bad data into their ROA system? How how do you fix that?

Job Snijders: My personal interpretation of the consensus amongst operators: in the RPKI ecosystem, there is tooling called SLURM, which is Local Overrides for RPki-Derived information. But SLURM configurations stay within your own administrative domain, so they do not your SLURM configuration does not propagate outside your AS. So by and large, that tool is not very useful to fix the mistake the customer made. And the consensus amongst operators is: consider it an AS confederation, and BGP across it, but an AS confederation still in the same autonomous system. See the first statement you made, is that what I'm hearing from you?

Susan Hares: The customer is not part of the confederation, right? The customer could be part of the confederation. No...

Job Snijders: I mean, from... thank you. I mean, I caught your point and I have some discussion at the mailing list. Yes, as a lot of operators take very seriously about their ROA configuration and and the prefix in they actually for uh all have tools like inventory management. But we also had thought that one thing is that sometimes there's two teams to do ROA and routing announcement, right? This is one thing. The other thing is that the prefix owner is not all by AS holder. Some prefixes is not hold by AS, they have the you know right to sign their own ROA. So I mean basically there different you know figures to do to do this kind of thing, so they may mis-make mistake or misunderstanding each other. That's one thing. The other thing is make me very confused is that if you know I every time I take seriously, but if things get get better, why you know like NIST RPKI monitoring shows that the invalid number, the value, the scale of the invalid number, it's kind of not not approve of you know get better. Like it's thousand and and even recently can reach 20,000 like kind of prefix is uh invalid. So a majority of them is not I don't want say it's it's just not true hijack. It's not true hijack.

Job Snijders: I I might have some answers for you there. Much to the surprise of many people, a lot of IPv4 space is not in use. So people will at some point in history have configured a ROA, maybe configured it wrong, but they're not using the IP space so there is no consequence from a business perspective, but it does show up in the NIST monitor as invalid. And they don't care because they're not using the IP space. Paolo and I did extensive work to measure how much traffic goes towards invalid destinations, and our conclusion was it is statistical noise. So so that is one explanation. The other cause of the high count of invalids in the NIST monitor is that the data they ingest is often an unfiltered view that contains more specific announcements that are not really part of the global routing system but are sent to route collectors. So the data that the route collectors receive is a little a slightly different flavor than regular internet routing. So those two causes contribute to what seems to be a high number of RPKI invalid routes, but the reality is it is statistical noise. Yes, it's multiple thousands, it doesn't matter.

Ming-Quan: I mean, okay, that's right, okay. So, first I I don't want to say it's it's just not true hijack, but if we make too low true hijack, you know, participated in the routing system, they will hurt the somehow hurt the the business. So no matter the operational overhead or or you know the routing you know efficiency is essential. So when we to take step back, anyway the the control plane of routing and the management plane of RPKI, they are desynchronized somehow. You just mentioned this tool somehow is trying to to better to get things synchronized together. But they still RPKI management system is basically is not automated aligned with the routing system. There's a so this is fundamental thing we think we think they have. Yeah.

Job Snijders: Yeah, I I understand some of the friction, but ultimately it is the responsibility of the vendor to help and educate the customer. And if the vendor tells the customer "We need to interconnect with single-mode," and the customer shows up with multi-mode fiber, then it's like well you you have to abide to the rules that as vendor we we set.

Susan Hares: You may want to take that, there is one other question that Shen-Gang and let's say you find the mistake, it's your own mistake, you correct it, and you want to reannounce it. What's the timing on that? How long does it take like you gave me the example of DNS taking some time to go out? I just wondered if you'd comment on that and then I think we'll hit our end of time and you can of course continue to debate things about stuff, but I thought that was helpful.

Job Snijders: Yeah. So, like the P95 as I currently understand it in terms of ROA propagation is roughly an hour. So if you make a mistake and you correct it, it's probably propagated within an hour. But you could be unlucky and it could be two hours, or you're lucky and it's ten minutes. SIDROPS is actively working on increasing the propagation speed of ROAs, this is called the RRDP synchronization protocol, but this is in development and it will you know take some time. Um but yeah, it's it's in the order of an hour that that things propagate, but there's many parameters in play. Uh so yeah, don't don't make mistakes.

Susan Hares: And I think you've been kind enough to give the answer. I hope this was interesting to the rest, there's slides and these guys are up here and the reason I hope you come up with a good discussion I sort of requiring a good discussion and a problem before we do any more BGP work. Thanks.

Ming-Quan: Thanks.

Job Snijders: Thank you. Well, and with this, I think we call it a day, right? We call it a session and we call it a day. We have one more slide. Oh, if you pull up the chair slides. Oh, with our goodbye slide. Yeah, it is so crazy to see millions of LEDs at night. I mean, this city is fantastic. All right, that was our last slide. Thank you for coming to the GROW meeting. and see you in Vienna. Vienna, yeah. Hooray! Thank you all.