Markdown Version

Session Date/Time: 16 Mar 2026 06:00

This is a verbatim transcript of the entire audio recording.

Mahesh Jethanandani: All right, it's time. This is the OPS area meeting. If you were in this room for another meeting, this would be a good time to exit. Um, I have Med with me. I'm Mahesh and I want to thank Thomas Graf for agreeing to take the minutes for this meeting. But we would be happy if other folks also join in in the taking of notes, especially about comments that you might have provided at the mic. This is Monday. I'm sure you have attended at least one meeting with Note Well. But in case you haven't, um, essentially by participating in the IETF, you are agreeing to all the policies and processes that IETF has laid down. So you should be familiar with this. If you are in the room, uh, make sure you sign into Meetecho. Hopefully, the on-site tool, uh, that will allow you to get in the queue to ask any questions and of course, it's a virtual blue sheet for us so we know you were here. Keep your audio and video off as you are using the Meetecho tool. For remote participants, uh, keep your audio and video off till it is time for you to do the— to be the person who's presenting. And of course, uh, if you have a headset, use it. I think you'll find it helpful and it'll find— it'll find it easier to hear you. Even though you queue up and your name shows up, it might be still helpful to state your name when you're asking a question or providing a comment. So here's the agenda for today. Um, unless there's anything else you would like to see on the agenda, we'll run through this. Okay. I'm told that, uh, that should be enough for us to get started. So we have our first set of— the first presentation is from— let me see. Share slides. State of the OPS Area. Paulo? From the slide state. Okay. You can just— yeah. So— so just waiting for Paulo to— um, yeah, to be here on the stage. We can just share, I would say, some of the— if you can just move to the next slide. Yeah, so this is just, I would say, a snapshot of, um, the work done in the area about the working group re-chartering. So as you have— you can see there, we have done, I would say, almost one-half of the our working group that was that went into re-chartering, which is really, um, great so that we can have, I would say, a refresh and core for our work. And, uh, we all— we invite, I would say, all the working groups, yeah, that's the— to check the charters and to let us know when there is a need for us and help from the ADs to— to help you, I would say, have your work being, um, appropriately scoped and anchored in the charter. Next slide, please. This is also the snapshot of the, um, I would say, the leadership, uh, update and changes we have on the various working groups and the directorate from the— in the area. As you can see, so, um, there are almost half of the working group that have seen changes. So, uh, with new, um, faces and, uh, new blood coming into, um, into this— this pipe. Next slide. Um, so in the— in the slides that you will be, I would say, that are on the— on the proceedings, there are, um, the updates of all the OPS working groups and all OPS directorates that are, um, available there. But during this— this session and for the sake of, I would say, optimizing and helping diving into specific working groups, we'll be focusing on two working groups this time. Uh, this will be GROW with Paulo and, um, SRV6 from— from Dhruv. And then there is a presentation from the YANG area from Shufang. Uh, so Paulo, please.

Paolo Lucente: Hello. Yeah, next, oh. Um, so, um, yeah, uh, GROW working group. Uh, what are the— what is the GROW working group? It's essentially like the companion of, uh, IDR. If in IDR you specify what is BGP, essentially in GROW you say how, uh, you can operate BGP in the global internet. This is, in a nutshell, you know, the mission of the working group. Um, what are the main topics? Uh, BMP, which is the BGP Monitoring Protocol, uh, which has been originally, you know, standardized in— in the working group, and there is some follow-up work going on. And then some different aspects of BGP operations. Uh, the drama level of the working group is very, very low, near zero at the moment. So that's excellent. And uh, what we have is uh one document, uh, that is going— that, you know, is in last call since IETF 124. We have one uh very small document uh that got adopted, and another one about route leaks uh that was flagged dead as of last week. Uh, we had, uh— um, a chat with the authors of this document, uh, Alexander Azimov and Sriram, and they essentially confirmed that, uh, the document lost its momentum. Right now we have ASPA, we have other things going on, and so yeah, continuing that work was just not uh— um, fruitful anymore. Uh, since 124 there has been no charter update or leadership update. And now for who was in Madrid for this same presentation that I had, this is the moment that I was waiting for, which is that I have another slide, finally. So, um, the GROW working group, as I was saying, it's the home to BMP. Uh, so for who doesn't know uh what BMP is, it's the monitoring plane for, uh, BGP, right? Uh, so for long time we have been using uh either BGP itself, uh, to monitor the protocol, so we were using BGP peerings, um, and sometimes we could not access the information, right? Uh, so like if we wanted Adj-RIB-In or Adj-RIB-Out information, essentially it was like a screen scraping, right? And so with BMP essentially we are giving a uniform platform, uh, we have a protocol, and we get access in all the different vantage points that can be useful for the BGP monitoring. Um, so yeah, it gives access to all the different RIBs. Um, you can monitor pre- and post-policy, as we will see it in the next slides. Uh, we have, you know, the initial synchronization, uh, like if it was, you know, a BGP peering. So we have the blast, you know, of all the routes when a peer comes up. Then we have the incremental updates, essentially we really encode the BGP PDU right in the so-called route monitoring. And then we have also statistics and peers up, peers down, and things like that. And it's uh— an extremely simple protocol. It has— it was meant to be, uh, unidirectional. So it runs on TCP and there is no feedback, no agreement, no anything with the— with the collector. So you have a router that is exporting data and there is never, you know, collector saying anything back. So it's a relatively simple protocol by design, I would say. Um, this is a recap of, you know, uh— having a diagram in the slides is always nice, right? So this is why I— I said let's do it. Um, so you can see what are the vantage points. You have so the Adj-RIB-In, the local RIB and the Adj-RIB-Out. And Adj-RIB-In and Adj-RIB-Out, you have pre- and post-policies. So you have these five vantage points uh from which you can, uh, you know, monitor the different information as it moves through, you know, uh, the— the router, the peer. And yeah, what is the status of BMP? Like, so we had uh an original RFC was very oriented to, I don't know, security use cases. It was limited to Adj-RIB-In, pre- and post-policy. And then uh, essentially like the first thing that we did was to add new vantage points. So Local-RIB and Adj-RIB-Out, uh, they came later. And so this is looking back. Looking forward, um, we are making the format extensible, uh, right? So we are making sure that TLV every message type in BMP has a— as TLVs. So you can extend with additional information the core of the message, which was not the case with the original specification. We are adding event-driven messages. So, you know, on top of statistics and synchronization with the peer, uh, with the peers, we have also event-driven messages. So I don't know, you— you want just to convey I don't know, uh, that, um, there is a status change of the validation of a prefix, for example. That would be an event. Um, we are standardizing the YANG model for BMP. And BMP is kind of hot. I was looking at all the documents uh that we have that either were adopted or they were, you know, still in personal proposal status. We have really, you know, 25 documents. So it's quite a lot. There is quite some work, um, you know, on BMP. But like I was saying, BMP is not the only thing that GROW does. Uh, so we have the near real-time mirroring protocol for IRRs that, um, you know, it's almost on, you know, the goal line. We have the peering API, which is a very interesting uh initiative, in which essentially like we are trying to define uh how, um, you know, two systems can— two ASNs can interact through an API to establish a peering, for automation purposes, of course, for mass scaling out peerings. And then we have a number of documents around best practices for BGP in the global internet, terminology, uh, so you can see there are security considerations and um, uh, so things around security terminology and in this moment also considerations around BGP communities. So that's it for me. This is GROW. I hope it— it excites you enough. If you didn't attend GROW, please come. It's after the session after this one. Thank you.

Mahesh Jethanandani: Any question for— for Paolo before we move to Dhruv? Well, I do, in case anyone doesn't. Um, so since you mentioned the YANG model um for BMP, this is for transport over IPFIX, over YANG push?

Paolo Lucente: No, this is at the moment, it's for configuration only, right? But we of course are thinking, you know, what could be the next steps. Because once you have a model, you use for configuration, then you could use for um— um exporting data as well.

Mahesh Jethanandani: Okay.

Paolo Lucente: Thank you. Thank you, Paulo.

Dhruv Dhody: Uh, hello everyone. I'm Dhruv. I'll be talking about SRv6-Ops on behalf of my co-chairs Weiqiang and Dan. What is SRv6? Basically, it's a forum for all network operators to come and discuss various SRv6 operational matters. Uh, the things that are hot this time, we have various presentations from operators who are dealing with SRv6 in cloud, uh, doing service function chaining with security services, applying SRv6 in a power grid enterprise network, so various different deployment cases of SRv6 this time. And we have two sessions. One session we are focusing on the operator presentation, and then on Friday we have all the drafts that we will be discussing. The hot topics: Address planning, that is a new draft that we discussed last time, it's on the agenda this time as well, making good progress there. Uh, the key achievement from the group as per our template: basically we made our first two drafts as working group drafts. Those are the Deployment Options and Problem Summary. And we continue to see good participation from the operators, and that's something, like, worth replicating at other places as well. So before we go in detail, why did we even create this working group? The main reason was SRv6 work is highly distributed. We have SPRING where we do architecture and the main framework, but that is shared between SR-MPLS and SRv6. Any data plane enhancement happens at 6MAN. If things which are related to IPv6 operation, that goes in V6OPS. All the control plane extensions that we do for SRv6 are distributed in all the different working groups like PCE, IDR, etc. So having a forum where we can have targeted discussions focusing on SRv6 operational matters was deemed useful. And that's where the community said that give us a forum where we can discuss this topic because other work is in various different pieces in different working groups. Uh, and operators are deploying SRv6 in environments which are very, very different: ISPs, cloud, just the examples that we saw of the different cases that are being presented this time itself. And there are various operational questions that are being raised. What is the best migration strategy? Address planning strategy? SID strategy? How do I monitor interwork, etc. So this forum has been a pretty good experience in getting those voices in. Uh, we hope we could deliver more with actual recommendation and best practices, which we have two drafts which are adopted, but we hope we could do more of actually laying out guidelines as our next step. So one thing we decided from the start, we wanted a way for us to be more inviting to operators, especially folks who are not active in IETF. And always pointing them to our data tracker was— though it's written for us who are updating documents, it's not the most well-versed thing for people who are new to the IETF. So we developed this working group page, and we try to point people to that. It is also maintained in our GitHub, so we accept, like, if people have ideas on how to do this better, some new things that we can add on the website, please submit PR for that. So this has been pretty good. We have, like, you know, just a very welcome message inviting where our next meeting is. And if you are an operator who's deploying SRv6, come talk to the SRv6-Ops chairs and we would love to help you and guide you on how to present in one of our sessions. Uh, these are various different operators over last couple of meetings who have come and given us, um, like you know, given us their thoughts. The presentations have varied from "we are deploying in these different environments and these are the challenges we are facing." Some people are starting on their SRv6 journey, so they want to present the questions that needs answering before they start. Pretty good diverse set of topics. They're all linked to the YouTube timed PDF and the meeting information. Trying it to make it as simple: "Look, there is already so many operators. Why don't you come and tell what is the unique thing that you are facing in your network?" The two drafts that we have started adoption— we have already finished adoption and we are working towards hopefully publication. The first is the SRv6 Deployment Option, um, mainly providing operational guidance how to migrate from if you have an MPLS or SR-MPLS, and if you decide SRv6 is for you, what are the steps that you could follow, and what are the different options, what are the different tradeoffs, things like that. Not giving one standard way but discussing them in an informational document. The other one, which is Deployment and Operational Problem Summary, this is basically a list of various things that we hope this working group can agree on, that these are the problems that we are facing, and then develop more documents focusing on what are the best way operationally to solve them. If there is things that are identified as extensions, that doesn't belong in this group. This is only with respect to operations, what are the way in which we could solve this, and what are the best practices out there from people who have already deployed this at scale. Uh, and but we also have lot of topics from an individual ID, so there is lot of interest. People are deploying SRv6 in different deployment architecture, especially with AI back-end, DC front-end, service function chaining, so those are being proposed. I mentioned about address planning, that is something that we hope we can make good progress on. Traffic steering, policy selection strategies, load balancing, protection, uh, various different things. We are still finding our way on how do we club? There are way too many documents. What is the best way to club them? And how do we figure this thing out? That's still being discussed. Uh, it's a new working group, so it takes time to figure those things out as well. So this is my last slide. Uh, if you are aware of more SRv6 deployment out there, and some, uh, things that will be useful to document as a best practice, as a guidance to others, or in general inviting them to participate in IETF and be active, this is a good entry point, a venue for them to come in, and we would like to encourage that as well. Thanks.

Paulo Lucente: Any comment for— for Dhruv? Yeah, I just have one question from my side. Um, since I think that's now it's almost two years that the working group is— is alive. And during the, I would say, the chartering of this working group there was a fear about the overlapping with the SPRING working group at the time. So with now with the experience we have today, how things are really working with that working group?

Dhruv Dhody: Uh, I think having a common AD chairs that are talking to each other and before we make any decision, keeping each other in loop, that always helps. And I think we can— we have SPRING chairs in the room. Maybe they would like to say anything.

Alvaro Retana: Um, sure. Alvaro Retana, SPRING chair. Uh, yes, I agree with Dhruv. Uh, you know, the overlap of the— of the AD helps a lot to keep us sort of honest and in our lanes. Uh, in SPRING, as Dhruv already mentioned, we coordinate with a lot of other working groups, uh, lot with IDR and with 6MAN and with V6OPS, so we do this with, uh, yeah, pretty much all the other chair groups uh to coordinate and make sure that we're doing what we need to do and they're doing what they need to do. So I think it is working very well. Yes, thank you.

Dhruv Dhody: Thank you. Thank you, Alvaro.

Shufang: Hello everyone, good afternoon. My name is Shufang and on behalf of the YANG Doctors secretary team, uh, this is a presentation to zoom on YANG Doctors. By the way, Paul and I ourselves are YANG Doctors as well. So, uh, the YANG Doctors team overview, uh, because it is required that all the, uh, IETF documents with the YANG modules must be reviewed by YANG Doctors before IESG approval. So we are such an, uh, active volunteer expert team on the IETF OPS area. And we are actually a mandatory review body, uh, for the YANG module documents with a key focus on the IETF Netmod YANG standards. So we also act as the Netconf YANG advisors to provide help to authors like YANG expertise in all IETF working groups and areas. So a lot— our the YANG Doctors are really active in IETF Netmod working group. For example, some of our YANG Doctors make really active contribution to the YANG versioning work, and others might lead the discussion of YANG Next. So, uh, we have close cooperation with, uh, Netmod working group and we provide support to all the, uh, working groups in IETF. So there is a detail— if you'd like to more— would like more details, feel free to refer to our data track page and also the mailing list for related discussion and also the track the reviews. Uh, the YANG Doctors review principles because the majority of our work is to provide the review work. So, uh, currently we have 17 YANG expertise. Uh, however, two of them are not available to do reviews at this time, maybe because they are too busy now. So, uh, generally the tooling will, uh, for a new review task means the draft has never been reviewed by YANG Doctors. The tooling will suggest a round robin assignment, uh, to assign a YANG Doctor for the review. And it's also common for the working group chairs to request repeated review for a single draft because they see that the document might benefit from another review by our YANG Doctors as the draft involves. So, uh, generally the tooling will suggest the original reviewer to give a quick pass at it. So we have a couple of key references that serve as our guiding principles. The fundamental standard is the RFC 7950, the YANG 1.1 specification, and also the RFC 8407bis— I think it's just published today, congratulations— as the RFC 9907 on the guidelines for authors and reviewers of YANG data model documents. There are some other, other key compliance like the RFC 8342, the NMDA, so we will check whether the draft, the YANG module is NMDA compliant, for example, to have a single hierarchy for all the configuration and operational data, as well as the YANG versioning work, which is actually a set of documents in Netmod, and once that draft is going to be approved from IESG, we definitely will check the revisions of YANG modules to be compliant with YANG versioning. And there is also some potential feature alignment as the discussion of Velocity in OPS area working group, which is actually an experiment to, like, for the publishing, documenting and publishing YANG modules. Uh, so when that experiment is conducted, definitely we will do some alignment with that. And in the end, we always welcome feedback from authors, working groups, and reviewers to help us refine our review criteria if you think needed. So this slide is an overview of our past one year, uh, review data. So in the past one year, we have received a total of 65 pieces review requests, and six of them were withdrawn and three of them are review in progress now. So we have completed 56 requests. However, only 66% of the requests are finished or completed on time, means give the review before the deadline, and others are actually delayed. I guess a lot of reasons. Uh, I think because one reason I— I see could be because a lot of our YANG Doctors are no longer very active in IETF now, so they have limited time and efforts put into the review work. And or it's just because the YANG module is quite long or complex, and our YANG Doctors is not really familiar with the protocol discussed in that working group, so it just takes time to deliver a high-quality review. So, uh, the majority of the review are actually the early review because it is required that all the YANG module drafts must be reviewed during the working group last call or even earlier. And some chairs may request a repeated review during for the IETF last call review if they see needed. And the review outcomes are quite split. So this is our top five reviewers who have completed the most review tasks over the past one year. A lot of thanks to Lada, Andy, Joe, Eben, and Martin. I think let's hear some applause here. We really appreciate your time and efforts. Thanks for the kind support and thanks for your contribution. So we also facing some key challenges now and, uh, so the following two slides will give some key challenges and the potential actions that we might take. So, the very first challenge that I see could be the insufficient active YANG Doctors. So as I said, we— there is actually a high load on a small set of— a small group of people to handle the review requests and provide technical guidance. There is actually a tension between the early engagement and resources. On the one hand, we can see that early engagement of YANG Doctors could help avoid, uh, the draft going farther the wrong direction and help identify the issue earlier, but this also at the same time, this also means more work or even we might need to do repeated reviews. So there is actually tension. And for this challenge, I think we are always calling for the YANG Doctors— I mean for calling for the YANG expertise join our team. So, uh, if you'd like to make contribution, if you'd like to do the reviews, and you have the YANG expertise, please feel free to reach out to us if you are interested. Another key challenge I see that is about the future readiness. So, uh, as YANG itself is evolves, and we also need to continuously update our review criteria and expertise and also prepare for potential process changes. So, uh, so generally we will align review guidelines with involving YANG standards like the RFC 8407bis and also the YANG versioning, YANG Next work. And we will also track the emerging processes such as Velocity. So as that draft is being discussed when that experiment is conducted, uh, we might need to update our review criteria, and there might be also update in our review scope, like we might expand our scope from just checking the syntax and semantics of YANG modules to like the CI/CD check setup and the validity of that links to modules, etc. So, this is actually an ID template with YANG modules under the— there is under the IETF OPS AD organization. Uh, so this repo is, uh, I guess— I think it's used for the make-up— for the markdown template. So, uh, with the integration of YANG validations with draft development, the auto generation of YANG tree diagrams, as well as the validation of JSON examples. So if you— if you want to write your draft with YANG modules, feel free to use this template. We encourage the use of this template to give some like automation checking and generation. Okay, I think that's it.

Thomas Graf: Hi, Shufang. Thomas Graf. just one con— so first of all, ID template, that's really great to have, looking forward to use it. Just one comment on the YANG tooling, like currently in the data tracker we are using pyang and yanglint to validate the YANG drafts. And as Mahesh is aware of, when YANG extensions are there, you know, it's quite important for the IETF before we are actually releasing that as an RFC that it's actually supported within the tools. However, in the past, those decisions were not taken and I have one concrete example, which is the YANG Structure extension. So currently all the documents which have YANG structure are failing in the data tracker. So libyang has recently been updated to support that. However, we still have on the tooling on the data tracker, we are still using the old version of libyang. So it would be really good to update that to the latest releases.

Mahesh Jethanandani: Joe, what do you think the state of—

Joe Clarke: Oh, sensitive. Hi, Joe Clarke. Um, I put my name in the queue specifically for this. Um, uh, so data tracker tracks stable Debian packages for libyang. Um, Per Anderson is a Debian maintainer and he's offered to create a Debian for libyang, a more modern version in 2.x is what we're using now. Um, if he does that with whatever is cut from a release standpoint, um, I— Robert has— Robert Sparks has said he would happily update that. We can do a pull request to data tracker and get it in there. That seems to be reasonable. I mean, we're planning to do the same kind of thing for YANG versioning. We're going to need the new yanglint, we're going to need the new pyang and that kind of stuff.

Mahesh Jethanandani: Okay, thanks for that update. Um, so I also put myself in the queue, uh, to kind of help Shufang with since this is almost a recruiting event for YANG— you'd be um happy to know that the transcription it's making— it's saying it's young doctors, not YANG Doctors. So, if that's incentive enough for you, you're the young folks. Um, but maybe the question for Shufang is, what is the typical load for the current set of reviewers? If somebody is trying to sign up, how many documents in a year or in a month are they looking at to review?

Shufang: I think there is a Wiki page to document actually what a YANG Doctor should review. Generally the focus would be the YANG module itself, but a lot of our YANG Doctors will provide other review in the draft like the if the text is aligned with that YANG module and also other like—

Mahesh Jethanandani: Right, so the question is what is the load meaning how many drafts are they reviewing? How many— each YANG Doctor, how many documents do they review in a month or a year?

Shufang: Uh, yeah. For each YANG Doctor, they can set their own, like, frequencies to do the review. A majority of them will like to do one draft, review one draft every two months. I guess this is the most popular, like, frequency. Others might be more frequent or because it's just a repeated review and the YANG Doctor has finished and then has to do another review. Yeah.

Joe Clarke: Yeah, Joe Clarke again. Um, I was on that list. I had five. I don't feel overwhelmed. It's maybe two a month. It seems like less than that sometimes. Um, and yeah, like Shufang said, it— you review what you can and you can change— I haven't changed my preferences, so what I get, I— I try to review. It doesn't feel nearly overwhelming. And 8407bis, I mean, there's a lot of guidelines to help, too.

Shufang: Yes, yeah, exactly. Okay, thank you. Thank you.

Med: All right, so let's speak about the draft-ietf-opsawg-rfc5706bis, so guidelines for considering operations and management in IETF specifications. So on behalf of the different authors, let me clarify what we do here. The RFC this RFC 5706 is actually like dated from 2009. So it was good at that time, but there are a couple of problems. The content itself still speak about MIBs. It at the same time targets multiple audiences. Uh, and also the guidance is lost. There is a mix of guidance and also technical background. So what we decided is to update this document. So for this OPS area discussion, we decided to give the rationale and explain what we want to do with this draft. This is a working group document in the OPSAWG and we're going to discuss specific text in that working group. So what's the rationale? Don't delay thinking about how to deploy and operate your new protocol. And what we did, because we receive a lot of feedback from directorates reviews, is that we try to clarify this with text directly. So at the same time you can read the exact text that we put in the draft. And we try to make it very clear: a core principle of this document is to encourage early on discussion rather than mandating any specific solution. Which is something that we've been trying to solve like many, many years ago, it was like, "Oh, operation, we'll do it later." Now, what we did as well is that we put in the appendix like typical questions that you might be considering. So about operational fits, about configuration management, about performance management, about fault management. So I encourage you to review the document, but at some point in time, if you want to just ask yourself the right question, you might also approach the document from the appendix. What we stress because we have some questions about it is that in the end, the different working group will decide what is right for them. So let me read the text to you: a working group may decide that its protocol does not need interoperable operational and management or even a standardized model. But it should be a deliberate and documented decision, and not simply like the reason of omission. So we try to give this all this rationale and and those principle at the beginning of the document. But if you do so, you must document the decision. And by the way, if you're paying attention, you will realize that I cut and pasted the wrong text there. So I'm going to correct it for the next version, but it says: it's perfectly fine if you don't want or don't need to have this operational right now. But please tell us a little bit about why at the time of the design. And the last thing we want is to— we don't want this to be a hurdle, right? We provide a framework where you could help you to say, "These are the typical question you might be thinking about. This is how you could structure information." But we don't want to make it painful for you. This is the key point. And we don't want to impose a solution to you. We don't want to impose a mandatory "You shall do it this way. You shall do a YANG model. You shall do IPFIX whatever." We don't imply a formal model or specific solution. And we don't even require you to develop that— you develop the solution directly, right? It might be also within a different document or different way. So we try to clarify all of these because we got questions from all the directorates. One more thing we clarified because we had the situation: let's assume I've got this old routing protocol, and we never thought about doing operational about it, fine. Now we do this extension. And what we're trying to say is that you will not delay— we don't want the extension to delay because you have no operation on the base protocol. This is not the intention. So we should not be held up waiting for operations and management solution to be developed on the— on the base specification. So what we did is that we've been updating the document. And to make it clear up front, we say there is a requirement for the IETF to have an operational consideration section in new RFCs in the IETF stream for protocol or protocol extension. But we stressed early in the abstract that there is the escape clause: that if you don't need one, simply document the rationale and we're good. We don't want to be a dictator, right? But that point was actually somewhere in the draft. Now this is in the in the abstract. So the author meet on a weekly basis. We keep everything open with the issues and the PR and we had three different revisions since the last IETF. And the fourth one might be posted very, very soon. Um, even now maybe. I— I want to express also that at the beginning this document was an AD sponsor document. Now in order to even get more feedback, it became a working group document where? In what I call the dispatch working group for operations, this is the OPSAWG working group. Bado Görtz, we've been working with Joe Clarke and myself and others on authoring this document. And when it became a working group document, we had an issue that both Joe and I are chair of OPSAWG. As a consequence, Alvaro became the document shepherd. And somehow it's a good thing because Alvaro has got this routing background and maybe routing is the first area which is the customer for this document. As I mentioned, much directorate feedback and by the way, also feedback not from directorate but directly on the mailing list and we still have to address and answer to this feedback. So I would say that generally the there is support for this document in a sense that there is good content in this. There is a— one comment about the compulsory nature of the section, whether it should be compulsory or not. Maybe the biggest highlight here in this slide is that this draft has been having an effect already, a positive effect already. If we look in the IESG telechat, the one from February, 57% of the document already have this operational consideration section. If I look at the the last one from March, March 5th, 71%. And what pleases us in as author is that we see this not only from you guys, right? Operations, but also from the different area: the routing, the security, in, etc. And we see this also for standard track, sorry, from PS, experimental, informational, proposed standard, informational, and experimental, which is a good thing. So the next step, okay, we're going to discuss this document in the OPSAWG. We as author, we have a couple of issues to resolve and a version will be posted very soon. We and I actually it's done. So this is real time. We're telling it is slow. I'm talking about posting a draft and it's done right now in the session. Thank you, Joe. So from an author point of view, we have addressed many of the comments. There are still some that need to be addressed and answered on the mailing list, fine. We're going to address all of them. And whenever it's done, then it will be time for last calling getting even more feedback from the community. And this is my last slide and if you have any questions or feedback, we can post a new version right now. Right, Joe?

Chung-Feng: Hello, Chung-Feng from China Telecom. Yeah. In this document I noticed that the one sentence which says operational consideration section will also often be appropriate in in the draft advanced for publication as informational RFCs, right? But I noticed some informational drafts, for example, problem statement draft, which may— may advance as RFC. I— I think— I don't think this kind of draft do not need the operational consideration. So what's your point?

Med: I think that you are fully right. It's exactly why we mention this escape clause. It says there, um— okay, the clock is in front of it— while providing an escape clause if no consideration are identified. A problem statement is typically something where you don't need to think yet about the operational statement. And to me you are fully right, we don't need it. So yesterday, for example, I was working on a document and I had a question from the co-author: it was an informational document and the question was: IANA consideration. There is no IANA consideration section in this document— no IANA action. And they ask why. And I explained okay because it's not a YANG model and all this. My point is that it took me like 30 seconds to write "There are no IANA consideration." You could do the same here, problem statement. There is no operational consideration right now because it's a problem statement. 30 seconds. So I agree doesn't apply, but just writing this one sentence is not going to be a time killer.

Chung-Feng: Okay, thank you.

Bhuvanesh: Bhuvanesh, Huawei. Yeah. As OPS directory chair, I think this is very helpful for all our helpful for all our OPS directory reviewer because I notice there are quite useful guidance on routing and OPS drafts review but like security ones, art, there's not enough. But I think in the slide you showed there is some repo there that also mentioned drafts from other areas. I think we— that is helpful to a lot of OPS reviewer. So I think if there's a repo there, we can send the link to our OPS directory and also adding to the template to show some best practice here of different areas. Thanks.

Med: We could do this, very good. I want to add two things is that: do you want to mention the OPS directory Wiki that we have because we extracted that from the RFC 5706 to put it there? Do you want to express a few words about that for the audience?

Bhuvanesh: I didn't follow you.

Med: So basically let me make the first point. The first point I want to make is that the OPS directory has got their own Wiki now where we extracted information from the old RFC specifically for the OPS directory. And the second thing I want to mention is that this discussion sparked another discussion. You mentioned security. And then we started to have like security consideration related to OPS, SECOPS. And it became too big this document. So as a consequence now we have a brand new document which is considered in the OPSAWG about SECOPS, which I thought was great and was maybe not, you know, the should not be the content of this specific draft. So this is sparking good discussions. Thank you. Thank you, Med.

Med: The next, um, three slots will be having— I would say, um, having a focus on AI and operations. So we'll be having three operators that we invited from China here. So thank you again for three of you accepting to— to share with us your thoughts, challenges, and also the work we are doing on this. Um, Qung, please.

Qung: Okay, thank you all. And this is my pleasure to have this opportunity to share our experience, um, for operator area with all you. So first, brief introduction of China Telecom. China Telecom is one of the biggest operators in the world and we have 440 million mobile subscribers and 240 million wireline broadband subscribers. And our core technical strategy is cloud network convergence. And our overall objective for our operational goal is the high scalable, excellent customer experience and the high efficiency. So we introduce some integrated operation from network to the cloud and IT together. And also we offer efficient scheduling, and fast— fast response to our customers, and also high safety operation ability. So one of the most important core platform is what we called Cloud Network Operating System. The CNOS is a unified platform which can control all the heterogeneous resources including network and the cloud and offer— also expose our network capability to up-layer applications. So actually it have four major components. The first is Cloud Network Controller. It can just collect the data directly from the devices, underlying devices and it can make the configuration automatically. And also, once it collect the data from the device, it will send to our Digital Twin system immediately. So our Digital Twin will store the real-time operational data and get the correlated relationship between the virtual and the real-world infrastructure. And the third is Integrated Scheduling System. It can give the full analysis of the computing and the network and it can give the decision for each application with its appropriate resource pool and the network path. And also we have the Network LRM, this is an intelligent engine very important for our system to make operation intelligent and decision-making automatically. So with the help of our CNOS, we have achieved autonomous level from Layer 3 to Layer 4 and the whole architecture had involved from Plus AI to AI native. So the next part I will introduce some of the key technologies. The first is high-guaranteed software-defined technology. That is we have developed both the controller and orchestration by ourselves, including the access controller, metro controller, backbone controller, and DC controller. And with these controllers, we adopt model-driven service configuration and the inbound flow information detection, and also we use some dynamic guarantee using SRv6. So we can use these our own built controller to detect all— to get all the data from the infrastructure and very in— in real-time. The next one is the Digital Twin. As I mentioned, when all the data has collected from the controller, it will direct send to the Digital Twin. So the Digital Twin can collect all the data from different controllers. Then it can have the cross-domain, cross-network, and cross-cloud network data and can help expose our APIs for up-layer applications, like the remote true-view, it can give the view combined with the customer level, and production level, and network level, and can help our network fault detection. And another important part is Network Large Model. We have— we have started developing our Network Large Model ever since 2023. And we have built this model based on 60 billion tokens of our high— high-quality dataset. And we also have built a model matrix, including our language model, our time series model, and several task models. And also, we have offered four process tool strings, including the training, inference, intelligent orchestration, etc. And for the roadmap of the agent, we have underlined several paths— several phases. At the first stage— phase, we are doing the co-pilot, including the knowledge co-pilot and operation co-pilot. And the last year, we have deployed our set up our own agents, including network management, maintenance, and network optimization. So here, up to now, we have already have over 1,000 agents in our production network, all over nationwide have deployed in our network— operational network. So in the process of our agents, we really find we have some challenges we need to solve. For example, we need to define the agent quality, we need to monitor the full cycle of our agent, and give the agent very clear responsibility and its boundaries for each agent, and we need to define its authority. And also for the security, and is also very important, we need to take— get the risk of the security, and also for multi-agent communication it's also very important. So we name digital employees for our carrier-grade agent. And up to now, we have over 1,000 digital employees in our network. So how to make these digital employee in reality? Here we introduce some agent-ops. The agent-ops includes the agent employee platform, that is it manages the agent, and another one is ACS, Agent Communication System. Then here the DEP, Digital Employee Platform, just acts like an HR. It will clarify the agent's job responsibility and can do separation with human and machine privilege, and onboarding, offboarding, transfers, and some audit, traceability, and performance view. And for ACS, it is based for multi-agent communication and collaboration. So it can do agent addressing, agent naming, and agent assurance, and etc. So combined with these two, we think we make the agent operational in our network. And we also add agent observability in our network, including the agent, the LRM, and the MCP. And we just define many kinds of metrics in the system, and we get these metrics from OpenTelemetry protocol. Okay, then I have several, the examples to share. The first one is entry application app named Xiaoqi. This application is now becomes our entrance of our employee. This is based on our Quattern and AI infrastructure, and it is a unified portal for for both co-pilot and agents, and also redefines our personal office official interaction. So up to now, over 50,000 of employees are using it in our daily life. And another one is AI agent for home broadband self-service. And when we introduce AI in our new system, when the workflow will be different, when we get a question from the user like "My home broadband is unable to connect to the internet. What's the reason?" Then the agent will first retrieve some related data from its database, and it will also query some real-time APIs, for example, if the account is okay, and what's the status of the optical mode, etc. And it will also combine with the prompt, and then in the end, it can generate the answer step-by-step, and it is easy to use and as a guideline for for our employee to fix the problem. So with— with these new methods, it has just changed the traditional way that we can use for our digital employee. That first is the task complication. With this way, the we can have the closed-loop problem and decision to help our employee. And the user interact— user centric language— neutral language, and also it can behave dynamically and automatically. Also, it can improve by itself very quickly. So actually it has largely improved our efficiency by 30%. So overall, we think it is very important to add agent-ops in our operational network, so we also still need to define some more protocols including the agent communication, observability, and security, and make our operational more intelligent. Thank you all for your consideration.

Med: Thank you very much for the great presentation. So you mentioned you're one or maybe the biggest operator in the world, and you are on all fronts: speak about digital twin, AD level 4, LLM, AI agent, OpenTelemetry, MCP, all of these. So what do you expect from the IETF? Us, what should we be solving in a standard way out of all the things that you're mentioning here? If you had to give top three priorities, right, because I see a lot of side meetings about some of the things you're doing, so top three priorities for the IETF, please.

Qung: Thank you, very good question. Yes, it's a wide— there have a lot of topics. Yes, and I think what we really need right now is how to make our agent operational and manageable. This is the most important part for our— for us now. And also for digital twin, for network collection data, we also need some new features that is needed, and we will discuss some more in offline. Yeah.

Med: One more question if I may. So whenever you say "agent more manageable," what does it mean? Is this discovery? Is this capability discovery? Is this the communication? Can you expand a little bit because actually maybe this IETF it goes in all directions with side meetings discussing something about AI agent, you name it.

Qung: Okay, thank you. Uh, that is what I mean is how to identify the agent, the agent responsibility, the boundaries, authority, and the traceability. I think all of these are very important for us.

Med: Thank you very much.

Fenyang: Hello, good afternoon. This is Fenyang from China Mobile. Today, the topic is to share the some thoughts on the challenge and practice of the operations of our IP network. So, brief introduction of China Mobile. More than 1,000 million of customers and 2 million of the 5G base stations and more than 1 million of servers. So, that's a big number. So, by that number we can understand why we need to have automated operations. One is for the better service fast quality of for our customers. Another one is the cost down. So, considering these years, we have great investment on the network, majorly mainly on the AI-related network. And the other one is complexity is brought from the convergence of the network and the cloud. That because if we consider there are some routers running on some servers or virtual machines, that will be quite complex. Last one is the AI advancement just give us opportunity to have more smart operations. So, the there are several challenges for the AI ops. AI is driven by data. So, data is the like the fuel of the the AI. So, the first problem is the data. The data comes from the different silos and they are inconsistent and they are overwhelming. So, with that, then this is number one. The number two is we need the some more efficient way of communication. That is becomes if the data comes from multi-vendors. Normally we— we need to have a long journey to standardize the data format and the data data content and so on. And we also consider if we take example, if there is something happened in the network, so there will be many iterations for the for the controller to get the data from the network and then do analysis and then go drill down. There were many iterations here. So, that's that's what I mean, the lack of communication. So, last one we cannot solve in our network, but most of such kind of things will be solved by the process of the operation. So, about the our AI operational basic picture, that's there are two objectives. That's one is to have the supporting our customer groups and the other one is to have better quality that we have mentioned just before. All of the objectives has been comprised of three loops from the customer requirement loop and cross then customer requirement has break down to have broken been broken down to the cross-domain services. Then it's broken down to the intra-domain resources. And all of the loops has been supported by four layers of intelligence. One is from device to network level, to service level, and to the business levels. And we have followed the some general architecture of the AI operational AI operations, which is the main objective is is just turn the data into the information, then turn the information into insight by the machine learning engine and last step we get the insight and take some actions. But here the most of actions need the human confirmation to prevent some some of the AI hallucination such kind of issues. So, let's take two examples of how we are doing the the operation. From the picture, we can see on the left side that is we are doing the data collection from various type of data. Some most important and that is a time series data. That is most for the performance. And we also have some some kind of alarm and ticket and topology, put them together into the middle, that is a model. In the model part, we have some of domain-specific models, which is just trained for the telecom communication. And there are also some of some agent-like here, we we just build up some of function-specific agents to to to get to get some insight on the on those kind of data. So, there are those kind of agents just output with some anomalies like the metric event. And we also can do the performance and the fault prediction based on this. Sometimes we can also have the have the root cause root cause analysis on on this. But after root cause analysis, we sometimes we we can come up with the solution by the by the system and last one is we need human human confirmation when before we can take action. More than 50% of the fault and those kind of things have already been handled by the system just in several minutes in most cases. The example, the second example is we have the SRv6 path planning. The main the main process is that we get the data from from current data and we also have have some prediction on the future of the network of state state data based on prediction. And based on the current current network status and predicted status, we are doing the path calculation and hopefully to give to get the best path and stable path. So, here the most important thing is this is different have some difference with the the example one. That is this path path planning is almost in real time. We have we have we have about more than 1,000 of path per second calculating abilities. So so when I talk talk a lot about the the the data and so the data becomes a very important. So I have some story, have one story and that's why why why we should move forward on this. The story is today the network infrastructure itself has have a lot of the computational abilities. This is this is very different from what we have what we had in years ago. So in that case, the network devices itself can do some of analysis. And the and the relation between the network devices and the network controllers, they can be some kind of peer-to-peer relationships. So that means if we have some intelligence on the devices and we have intelligence on the controllers, they can talk in a more smart way. That's what they talk is kind of semantic semantic communication. So in with that in mind, we can see this will eliminate the long time to standardize the the for example telemetry data and some some sometimes we can also can compressed the data quite a lot. So that's that's my idea of today. So here comes three open questions. One question is should we consider reconsider roles of the network devices and the network controllers in this era? Do we need a new framework? A question two is should we consider to do a semantic communication between the network device and the controller? So question three is should we define the data like for example the the insight or the more precisely the insight or the information communicated between the network devices and the controller? So that's three open questions. That's all for me today. Thank you.

Med: Yeah, thank you. Thank you for, yeah, for sharing with us the the conclusion of this work. And I think it's a good opportunity actually to to thank the the whole community, I would say, all the DNS people who are very excited about this work, the positive, I would say, feedback you have received, the constructive discussion, this is really, um, I found it really exemplary the way this this happened and the all the feedback you have received. And really special thank to you, Wes, for accommodating and, I would say, making— meeting all the milestones that we have, I would say, for this consultation. I am really impressed that the planning we had initially when we started this consultation, we are really— we have met all the— this milestones. So thank you again, Wes, for— for the work. I— I will be waiting for the the end of this week, and then you will be releasing the the new version of the draft. Then my plan is to formally conclude the the consultation and then start discussion with the IESG members, how we will be starting implementing— not exactly what we will have there, but this will be— there are some logistics and, I would say, some other aspects that would be taken into account. And we will be keeping, I would say, the the community informed.

Wes: Great, thanks. Bye all, have a good day.

Thomas Graf: Yes, hello. Thomas Graf, Swisscom. I really like your slides. I think you're really pointing on the importance on getting the right data from the network, focusing on semantics, and I fully agree on that. On your questions, like question two and three, I think I believe I can answer that. I think we already have at IETF quite some good protocols, like where we define the schema, which is YANG, which has an own data taxonomy and also has capabilities on the semantics. And I fully agree that basically we need efficiency in terms of how we are collecting data, how we are processing and also how we are indexing it, and currently we are trying to tackle that problem in the NMOP working group, so it would be great to collaborate on that.

Fenyang: Yeah, thank you.

Dhruv Dhody: Dhruv Dhody, Cisco. First, the presenter just before you and you, those were great slides, great presentation. And it feels like from a network management perspective we're struggling sometimes to what do we have to solve. To me, your slide four is very obvious, but maybe if we are having more of those presentations in the future, perhaps we should call it out, like a section that says "What do we expect from IETF?" so it's very— it's very obvious because it feels like people don't really see it if they never really run a network. But for the people that have run networks, it's very obvious. But for maybe for future, call it out: what do we expect from IETF? But again, this is great presentation. Thank you.

Fenyang: Thank you.

Jingzhao: Well, hi everyone. I'm Jingzhao from China Unicom. Today, I will introduce the China Unicom's intelligent operation and maintenance practices and reflections. Well, first I want to introduce an overview of the China Unicom's IP network. As a major integrated telecommunication carrier in China, we provide domestic and international communication and digital information services. So, the composition of the core IP network architecture including China 169 backbone network, IP metro plane network, CUII industrial network, smart metropolitan network, IP bearer network B, and the others. So to manage such a large-scale and complex IP network efficiently, we aim to keep the network with resilience enhancement capability for more intelligent operation and maintenance. So after doing more and more research work, we collect the key points from some key aspects across three waves. The first is pre-event. I want to introduce the points in network simulation. The first is cross-layer co-simulation. The problem is we have difficulties in physical IP application collaboration simulation, making it impossible to build full-scale twin models for complex environments. The current state is poor inter-layer coordination due to lack of unified mapping and logical alignment across protocols. And the next is log in rule generation. We can't rapidly and automatically generate simulation rules for diverse scenarios and dynamic demands. The current state is manual and semi-automated simulation configuration is slow and inflexible, unable to handle complex network elements and protocols. The third is insufficient data model fusion, gap between macro-level business forecasting and micro-level device validation. And the fourth is challenge in real-time performance and accuracy. We cannot meet the requirement for real-time accurate realization and zero-error validation in large-scale networks. And the next is in-event. I want to introduce the point in traffic monitoring. The first problem is lack of deterministic end-to-end localization because in large-scale and complex network, current telemetry techniques, such as iFIT, fail cannot achieve end-to-end monitoring due to insufficient multi-vendorsupport. Compared with the single-vendor scenarios, cross-domain service will mean more and more mean time to repair due to inconsistent telemetry data. And the next problem is real-time multi-dimensional analysis of massive traffic remains unachievable because flow method relies on sampling, and high-precision on-board analysis needs high resource overhead. And the next problem is blind spot in traffic change detection because IP-based monitoring suffers from header attribute degradation caused by cross-domain IP changes and NAT translation. And the next problem is mismatch between network monitoring and user experience because current methods are IP-centric monitoring, not application-aware. Without identifying specific applications, the network cannot provide differentiated services. And the next point is post-event in live network operation and maintenance issues. The first is long-tail risk exists after network cutover because improper configuration or code defects often cause periodic faults weeks or even months after deployment. And next problem is abrupt changes in traffic flow direction are hard to detect. Configuration changes in up-layer applications cause IP traffic surges or shifts, leaving the network layer only able to respond passively. And the next problem is the cost of response and fault troubleshooting are excessive high because complex fault troubleshooting still relies on manual cause expertise and cross-departmental coordination, lacking fully automated method. So, basically, so to address traditional operation and maintenance challenges and enhance network survivability, networks must be equipped with comprehensive resilience capability across the full life cycle. Here are some capabilities that we hope the network wants to achieve. And based on the analysis, we carried out some practices to address these challenges. Well, the first is IP native simulation systems. To address the core challenges of insufficient high-fidelity dynamic construction capability in network simulation, we propose a digital twin network architecture driven by integrated numerical simulation and emulation. The capability of our systems are two. The first is deduction analysis and dynamic rollback in cutover simulation. Cutover simulation requires configuration-level simulation, extract operation forward existing and state-based rollback and execution steps from live network steps, and unmanned operation for step-by-step impact analysis. The second is intelligent network traffic prediction. Traffic analysis requires numerical simulation, build a spherical temporal deep neural networks for end-to-end network traffic analysis. The issue real-time monitor for traffic beyond predicted region to guard operation. And the next practice is IP network traffic monitoring and analysis platform. To address the core invisibility challenge in network traffic monitoring, the platform provides an end-to-end IP traffic monitoring and analysis system covering home-broadband network, mobile network, bearer network, and application. The capability of our platform are two. The first is multi-domain full-stack collection, eliminate cross-professional barriers for end-to-end multi-domain traffic quality data collection. And the second is routing monitoring and analysis, establish routing analysis to turn invisible traffic into visible digital metrics. And the next is IPv6 capability assessment, compare IPv4 and IPv6 live network performance, identify deployment bottlenecks, and support IPv6+ roll out with data. And next is granular traffic flow insight, refined traffic analysis to province level, support inter-domain settlement, planning or service forecasting. And the next practice is intelligent computing center network operation maintenance platform. This system device end-to-end intelligent operation, control, and resource optimization for intelligent computing services via three core capabilities of computing network convergence layer. The first is full-spectrum perception, multi-dimensional monitoring, full-domain topology multi-dimensional fault availability, real-time millisecond-level traffic, lossless metrics and hardware resource monitoring. And next is full-scenario test, full-scenario business support, cover the entire process of task execution, including large-scale in-training and large-model fine-tuning. For example, on-demand computing resource scheduling. And the third is multi-mode optimization. The first is large-model traffic prediction and simulation. So based on the platform, we build an intelligent computing center operation maintenance control knowledge base, develop an in-house operation and control intelligent agent based on reinforcement learning and AI large models, and enable the implementation of custom operation maintenance control scenarios through capability opening. Well, that's our practices. If you are interested in our work, please reach out to us. Thank you for listening.

Med: Any questions for Jingzhao? Maybe answer the same question as mentioned by Benoit for the first presentation. If there are three, I would say, topics that you think that the IETF should work on, can you just list those if it's possible?

Jingzhao: Our presentation is after doing more and more research work from the live network. So we collect the core pinpoint for the intelligent operation. So we want the network have have some capabilities to enhance the network survivability. That means the network need to to have the resilience capabilities.

Med: Yeah, thank you.