IETF Meeting Minutes

Session Date/Time: 16 Mar 2026 08:30

Jean-François: Good afternoon everyone, and thank you for joining this NMRG session. So I'm Jean-François, I'm co-chair of this research group with Jefferson Campos Nobre. So this is the first session of NMRG. So, first, the Note Well regarding intellectual property. So, the NMRG is a research group from IRTF. IRTF follows the IETF intellectual property right disclosure rules. So as you can read here, but you have also those RFCs that you can consult and the links, that in particular if something that you present or discuss is covered by patent or patent application, you should disclose it in a timely manner.

Also, you have to know this point. In particular, the IRTF routinely makes recording of online and in-person meetings, including audio, video, and photographs, and publishes those recordings online. So if you participate in person and choose to not wear a red—do not photograph lanyard, then you consent to appear in such proceedings. And if you speak at the microphone, appear on a panel or carry out an official duty as a member of IRTF leadership, then you consent to appearing in the recording of you at that time. Also if you participate online and turn on your camera and/or microphone, then you consent to appear in such recordings.

Regarding the privacy code of conducts, so again, here are important points that you have to read and also reference to the different documents. So, beyond what has been said regarding the recording, it's important that personal information that you provide to IRTF will be handled in accordance with the privacy policy that is mentioned here. And as a participant or attendee, you agree to work respectfully with other participants. Then please contact the OMBUDS team if you have questions or concerns about this.

So also, as NMRG is a research group, it's important to remind you the goal of the IRTF that is different from the IETF. So IRTF focuses on longer-term research issues related to the internet, while the parallel organization, the IETF, focuses on shorter-term issues of engineering and standards-making. So IRTF conducts research, it's not a standard development organization. So while the IRTF can publish informational or experimental documents in the RFC series, its primary goal is to promote development of research collaboration and teamwork in exploring research issues related to internet protocols, application architecture and technology. And we also reference the RFC 7418 about an IRTF primer for IETF participants.

Here are the useful links for today. So please also feel free to help us, Jefferson and I, to take notes if you think we do not catch something properly in the notes. And I will not talk too much because, as you probably have seen in the agenda for today and the next session, it's quite a dense agenda. So here I just list the presentations of today. So as you can see, most of the presentations are related to AI, but the last one will be also related to the NMRG draft about use cases of intent-based networking.

I just want to highlight there will be another session this week in IETF on Thursday. And what is not mentioned in the slides, but it's also important and that you may have seen the announcement on the mailing list, that we will have an interim meeting on Saturday that is a joint session with ETSI ZSM that will be also held here.

So after this quick introduction, I think we can start with our first speaker. So I invite Wenlong Ding for the first presentation about Towards Intelligent Network Configuration Management with LLM.

Wenlong Ding: Okay, I guess I'll start. Hello everyone, I'm Wenlong Ding from the Chinese University of Hong Kong and I'm a PhD student supervised by Henry Hong Xu. And today I'm very glad to share our works about using LLM to manage a large set of the network configuration management tasks.

Okay, to begin with this topic, we first like to say that configuration management is really complex. Since there are three aspects. The first is there are many large-scale devices and we can see that there are one million devices for each typical DC region with more than 70 DC regions in the large network and there are more than 1000 devices for the WAN. And the second is that there are many huge number of configuration lines, such as first that for a typical device, the configuration lines exceed one million lines and there are many protocols such as Echo, BGP, IGP and some other components such as route map, interface and authentication configurations. So the third one is that the configuration will update really frequently and from a large production network, we found that there are over 100 manual updates every week and this include many aspects such as the new device prefixes and new policies groups, extra. So that we give a conclusion that the operators need many automated tools here.

So here we show that there are three aspects of our tool suite. The first is understanding the existing network. It receives the query network flows and output the flow intent in the existing works. And the second is the new config intent and we input this to perform the intent-based configuration and its output is the new device configuration scripts. And the third task is about the configuration verification and its input is about the properties to be verified and output the final script that is verified for deployment.

Okay, so the first work is about understanding the existing configuration, which we aim to in search of the lost time, which is the intent recovery in the network configurations. So the key problem here is that given an observed network state of interest and infer why it exists with the evidence, such as we ask a flow about why can prefix reach another prefix on a certain port. And our intent recovery asks why the property holds instead of the verification task that asks whether the property holds or the specification mining task that asks what the property holds. And there are three rationales behind this topic. The first is the why the problem exists, it is about because of the evolution of existing configuration is lossy due to the context decay. The second is why the work is hard, it is because that this task is ill-suited with inherent ambiguity, such as the configuration for the intent are distributed across the devices and times. The third that is why this deserves, why this topic deserves solving, it is because the recovered intent enable safe and subsequent intent-based network networking tasks.

And okay, so our key insight behind this is that we find that the snapshot alone is undeterministic, but the original design leaves the residual traces in the surrounding artifacts. So our insight here is to use some meaningful context as the regularization. So what is the useful traces and how to mine the context? And we find the first useful context is about semantics and the data source is about IPAM, which is a mapping table which maps the prefix or entities to its names, tags, or the hierarchy relations. And the challenge behind using this context is about the search space exploration, which is the many-to-one entity mapping. And second is about the provenance, which data source is a configuration snapshot and the challenge behind this is the reverse tracing, that is the distributed configuration across device for a typical certain flow intent. And the next, the three context is about history, its data source is from the Git evolution and the challenge here is about shadowing detection, which is the configuration of the different intents override each other.

Okay, so there are we developed three engines to resolve these three challenges. The first is about the semantic engine which we build a semantic context forest and use the unrelated tag pruning to get some multiple-level semantic candidates. For example, if we ask about the prefix about 10.0.2.8 here and we can get all its tags and meaningful metadata tags here. And the second is about the provenance engine, which we first get the all configuration on the traceroute devices. For example, we use the traceroute between two prefixes and we get all the configurations on this traceroute and then we use the LLM to perform the affinity-based filtering for this intent flows and finally get the most contributing configuration lines here. And the third is about the temporal engine and we here is to—our key idea is to replay the Git history and compare each other and use LLM to analysis which is the most related to intent flows and current state and then we get a timeline of the related rules.

Okay, so the—our key idea here or the primary design is to use a deterministic heuristic pipeline that progressively applies each context dimension as filters here and our solution here says that this for pipeline achieves 71% exact match accuracy and 79% intent F1 score here. And there are some future works to further optimize this, which we can build a joint optimization framework as the context in the three dimensions are intertwined.

And the second task is to conduct intent-based configuration and we use Echo and or the access control list as an example. Okay, so we can see that for the common manual configuration, operator needs to infer the policy or the intent to the attribute specifics and then deploy and verification through a huge loop. And so there are some automation challenges here about the intent configuration reasoning, which there are some complex syntax and network specifics in the low layer configurations and the conflict with existing rules, such as numerous and intertwined existing rules are very complex to solve. And the last one is about optimizing the deployment because naive method deploy the redundant rules. So we use three modules to solve this.

And the first is that there should be a—okay, there are two major challenges within this module. The first is that LLM is prone to hallucination, so we build a Echo intermediate representation templates with some well-designed prompt such as the Chain of Thought reasoning and to resolve this. And the second is that LLM lacks of the network specific information, such as the gateway and interfaces at the prefixes. So we prompt LLM with the and semantic network mapping table, which provides the network specific information that LLM lacks here. So that we can see that using this LLM can translate and natural language intent into the Echo rules and the deployment gateway information here.

And the second is that there are—if we use for the conflict detection, if we use the simple ways to, such as checking the overlapping and the opposite actions here, there are some false positive cases. The first is that because of the—there are some false positive because of the preceding rules and second is that some identified conflicts are not on the target interface. So we use two modules: the first is the truly match flows-based conflict detection and second is the interface path validation to resolve these two positives. And we after we get all the conflicts, we ask LLM to ask operators to propose the subset of conflict flows and to preserve the existing rule that should maintain the old actions, that we says it is the resolution with operators of the protect mechanism.

And here that for the deployment, we also develop some compared to the naive endpoint deployment method which deploys the rules at the gateways, we also have some novel observations to reduce this total deployment rules and we finally use a joint optimization formulation to formulate this observations. And as the result, we can see that for these three modules, it can have some low latency and have some further accuracy, detection accuracy, or the and the deployment accuracy than before.

And the last is about the configuration verification and the key question behind is that the current formal tools such as Batfish suffers from low speed and high memory, but our—but in the large networks, the there are strict time limits and the large cloud network scales. So the formal verification tools such as Batfish cannot scale to this network requirement.

So that we can see that—so we opt to the selective verification and we only use partial tools with the configuration for the formal verification. But we can see that in this example, if we select the shortest path here, we will select the wrong routers to analysis and finally unable to identify the key reasons that to identify the failure of this reachability. So that we can see that our conclusion is that naive or heuristic selection is error-prone without knowing the holistic configuration or the routing situation across the entire network.

So our key idea is to use the LLM-based lucky analysis to infer the routing states before and after configuration changes and identify the affected routers across the entire network. So the rationale behind this is that this approach offers high confidence than heuristics as LLM can rapidly reason about across protocols and it is much more confident than what humans do.

So that we can we mainly do two evaluations. The first is about naive prompting that we prompt all the configuration and topology information to the LLMs with detailed instructions. And second is use agent that we provide LLM with some tools to extract configuration and topology and the design docs. And we can see that for the task completion time and accuracy, the naive prompting configuration or topology information still suffers from scalability issues due to the context window, but the agent can have better scalability than the Batfish. And second is about the accuracy and we can see that for the agent mode, its accuracy is very high. So let's our conclusion is to let LLM play more and analysis less through the agent mode, that can better unlocks its potential.

Okay, so that's all for this speaking. And our next step is to do some more detailed implementations and evaluate on some real production networks and seek for some opportunities for real deployment. So thank you. I'm free to take your questions.

Jean-François: Thank you. So we have time for one quick question.

Muhammad: Yeah, quick question. Very nice talk. So I wonder how much time can this system save for the human operators? Like, did you test or evaluate? Thank you.

Wenlong Ding: So for the—for the first and the third topic, that because of network scale is very large, the human cannot—typically cannot analysis the whole network. So that is human infeasible, so we make it the holistic analysis within some hours or minutes. And for the second intent-based network configuration task, we have compared with the state-of-the-art work, we can have a 20 times faster than the current intent-based networks rules. So thank you.

Jean-François: Okay. Thank you very much. Thank you.

So let's move to the second talk that is—that will be given by Abdelkader Mekraouech, and he is online. So just Abdelkader, let me just prepare the slides and I will give you the control of the slide then. Let me enjoy the opportunity and ask for the people in the room to log in to onsite too, because we need this login in order to count the number of participants. Thank you.

Abdelkader Mekraouech: Hello everyone. Do you hear me well?

Jean-François: Yes, we can hear you very well. And normally you should have the control of the slide now. So please, please.

Abdelkader Mekraouech: Yes, I have it. Thanks a lot. So, my name is Abdelkader Mekraouech. I'm a researcher at EURECOM. I'm very glad to be here today, and I'd like to thank the chairs for inviting me to this NMRG meeting. So in today's presentation, I will talk about our approaches on agentic AI for the intent-based networking concept.

Basically, in this presentation, I will start with presenting the intent-based networking concept. Then I will move to our two approaches using agentic AI and also of course large language models in intent-based networking. So I will start with intent translation and then intent assurance, and finish with a conclusion.

So let's start with IBN. Basically in cellular networks, we have what we call the infrastructure layer, that will ensure the connectivity between the UEs and some 5G network or private network. So this infrastructure is composed of different technological domains. The first one is the Radio Access Network to ensure wireless connectivity between the UEs and the gNB. Then we have the Core Network to ensure authentication, session management, etc. And we have finally some vertical applications deployed at the edge cloud. So this is the infrastructure. And to manage this infrastructure and the resources, the users need to communicate with a management layer, and we call it here the OSS, Operations Support System.

Basically, in traditional network management, traditional network management frameworks rely on low-level technology-specific configurations, which is error-prone and poorly suited for autonomous networks like beyond 5G networks or 6G networks. To tackle this, standardization and research are moving—are pushing towards the use of this new concept called intent-based networking, which will simplify network management and enable the evolution towards autonomous networks. So in this concept, the users will use their intentions, or what we call here intents, to manage the network. And these intents abstract the complexities by allowing users to express what they want rather than how to implement it. It also supports closed-loop control, which allow system to adapt continuously to service objectives without any human intervention.

So IBN is composed of different stages. The first one is intent reporting, where the users will declare the intent using declarative structure. Then we have intent translation, where the high-level intent will be translated to low-level configurations. And then these low-level configurations will be activated in the intent activation phase. Then we have intent assurance, which is basically a closed loop that will ensure the intent respect the specified requirement. And finally, we have intent reporting to provide feedback to the users about the status of their intentions, or intent.

Actually, in current IBN, we have different standardization institutions that are defining the way users communicate with the OSS. We can mention 3GPP, TM Forum, and ETSI. And all the standardization bodies, they are defining OSS API endpoints and JSON structures. So in order to communicate with this OSS, the users need to use API endpoints and create these JSON structures. You can see an example of this JSON structure on the left side. This is called the NSD, Network Service Descriptor, from ETSI. And users need to generate this in order to define their intent. The challenge here is that the users must understand these APIs, which complicate IBN adoption.

In our approaches, we want to move towards the use of natural language to define the intent, which is the simplest form of intent. Basically, this will remove the API complexity. It will allow users to manage network without any prior knowledge, and also it will enable more automated and scalable intent-based networking. So in this setup, the users can use natural language in order to define their intent and send them to the OSS, to the management layer. And also, the network can provide feedback to the users in natural language. Of course, here we need an AI approach that can understand and generate this natural language text, where LLMs in this case are the best candidates.

Now let's move to intent translation with LLMs. As I said here in this intent profiling phase, in current IBN, the users are using API endpoints to define their intentions. You can see here an example of the OSS from EURECOM. We have our Operation Support System, and this is the set of API that the users have to use to manage the network. On the left side, you can see the set of API endpoints. We have different API endpoints to manage the infrastructure, to manage the services, etc. And you can see on the right side one API endpoint, which is service create. So if the users want to deploy service on our infrastructure, they need to call this endpoint and they need to generate this JSON, which is called the NSD. So this is the current intent structure.

So OSS APIs have many endpoints, each for different functionality. So we have endpoints for managing infrastructure, for managing resources, and also endpoints for managing services. The problem here is that each endpoint have a diverse JSON structure depending on the standard. So 3GPP, TM Forum, ETSI are defining different JSON structures. And this of course complicates IBN adoption because users need to understand these endpoints and need to understand how to create their JSON bodies in order to use this IBN system.

That's why we want to use natural language to express intent, and then we will perform translation to low-level configurations with an LLM-powered approach. This is an example of an intent in this setup. The user can say, "deploy a 5G communication service on the most available parts of the virtualized infrastructure" in natural language. And actually, this intent will be translated to three API calls in our OSS. So the first one is get VIMs to retrieve all the infrastructure, then post resource availability to check the availability of each infrastructure, finally post service create to create the service. And before creating the service, the users need to create this JSON structure.

So in this work, we need an AI approach that can move from natural language intents to a set of API calls, a workflow of API calls with the appropriate JSON for each API endpoint, and then execute these API calls to fulfill the intent. So this is the task that we want to do with the LLM. But of course, this set of tasks is very complex for one LLM agent to handle. This is why in this approach, we are using multiple LLM agents under the umbrella of agentic AI to fulfill the intent, each responsible for a specific role.

This is why we introduce this framework called OSSGPT, short for Operation Support System GPT, which basically translate natural language to a set of API calls, generate their JSON, and execute them sequentially to fulfill the intent. You can see the design of OSSGPT here in this figure. It is between the users and the OSS. And basically, it is composed of four main agents. So the first one is the assistant agent. This one will interact with the users using natural language. And so this assistant can respond to generic questions to the users, but if the user asks questions on how to manage or to manage the infrastructure, then this assistant will forward the request or intent to the planner. The planner will set the set of API calls to fulfill the intent. So it will choose which API calls and in which order. Then it will request the executor to execute each API endpoint based on some tools. And finally, we have the reporter agent that will see the conversation history and report back the feedback to the users using natural language.

So we are using here, as you can see, four main agents that are working collaboratively in order to fulfill the natural language intent. All these agents are trained using in-context learning. So we are using a generic large language model and we are training them using prompt engineering techniques. However, we have another LLM here. The executor is executing API calls using some tools. And before executing the API call, this executor need to generate the JSON structure, in our case we have the NSD, Network Service Descriptor. So for this, we created this tool called the blueprint generator, which is an LLM that will generate NSDs from natural language. In this case, we developed this LLM using a fine-tuning technique and we call it the NSD expert.

So let's see more in details on how we created this small language model. The task was to generate NSDs from natural language. So the first challenge was how to create the data set. Since no public data set available in the state of the art, we built our data set from scratch. So we started with 100 high-quality intent/NSD pairs and then we augmented this data set using LLMs in context learning. And then we fine-tuned an existing LLM with the LoRA technique. So LoRA is a parameter-efficient fine-tuning that allowed us to inject this NSD expertise of an existing open source LLM.

For implementation details, we are using two machines. The first one is on the bottom one, it's a Kubernetes cluster which will host the infrastructure components. You can see here we have the Radio Access Network, Core Network and Edge deployed. So we are using the 5G stack from OpenAirInterface. Then the second machine that will host the management components. So it will host the OSS components. Here we adopt a microservice architecture in our OSS. And then also we deploy in the same machine OSSGPT. And OSSGPT is using two main LLMs: so GPT-4 for the main agents assistant, planner, executor, and the reporter, and it is using also the NSD expert, which is the LLM that we created responsible for generating NSDs. And we used Ollama to deploy this LLM. Of course, this is inference, but for training, this NSD expert was trained using an NVIDIA A100 GPU. We use Llama 3.2 3-billion-instruct open source LLM as the base model. And then we trained it with the LoRA technique using the Unsloth framework.

Okay, let's move to the demo now. We consider in this demo that we have three infrastructures in the EURECOM testbed, so three clusters. And the user, a random user, will deploy one Radio Access Network sub-service, one application on the second infrastructure. So for the terminology in what you will see in the demo, when we say service, we mean an end-to-end service that can be composed of different sub-services. So you can have Radio Access Network sub-service, Core Network sub-service, and application sub-service. And each of these sub-services will be deployed on an infrastructure and we call it VIM here.

You can see here the front end of our OSS. The users have this nice dashboard. And then from this front end, the users can manage infrastructures, the set of clusters, and also services. So for example here, this random user have access to three infrastructures. He can deploy services on these infrastructures. And then you can see for example that we have two services deployed. The first one is for example, it is composed of Radio Access Network sub-service and the Core Network sub-service. As you can see here, the Core Network is deployed on the second VIM, VIM 2, and the Radio Access Network is deployed on VIM 3.

Now at this step, let's see how this user will use OSSGPT in order to deploy his intent. So this is a OSSGPT interface. This is a debugging interface. Of course we have a UI that the users will use, but this is a debugging interface where we can see the set of agents and the communication between these agents. So the user introduce this natural language intent. This is an example, "create a new service with name demo service containing one radio access sub-service and this latter contains one KPM xApp using" and he can introduce of course some configuration parameters, and then this should be deployed on the second VIM.

So this is the intent and then OSSGPT will translate this intent to multiple OSS API calls and execute each one sequentially. And at the end, the OSSGPT will generate this report in natural language. You can see that this report confirms the creation of this requested new service. These are the sets of logs from the OSS that OSSGPT took. So in order to fulfill the user's intent, OSSGPT made all these API calls. So first, it called the OSS to get the list of VIMs, then post service create to create the service, then check if the service is created, then check if the NSD exists, if not, it requested the NSD expert and so on.

In the front end, you can see that this demo service now is created, which contains one Radio Access Network sub-service. And it is deployed on the second VIM as the user mentioned. It also contains this KPM xApp as the user mentioned and it is instantiated and running successfully.

So we've seen how the users use this OSSGPT to introduce intent and deploy their services, or activate their intent. After deployment, we have the intent assurance phase. So in the intent assurance phase, it will be a closed control loop at the infrastructure that will ensure intent respect the specified requirement. We have another term for this intent assurance, which is Zero-Touch Network and Service Management. And it is composed of three steps: first one is detect or predict anomalies, then identify the root cause of the anomalies within the intent, and then resolve these anomalies to ensure the intent respect the specified requirement.

For anomaly detection and prediction, we have a lot of AI methods that have been widely used in research. The problem here is that these AI are black boxes, which lacks explainability, making it difficult to extract the root cause of the anomaly. That's why we have XAI methods, explainable AI methods in the state of the art, to explain these AI decisions and therefore we can extract the root cause of the anomaly. But the problem here is that in the report, in the intent reporting, these XAI values are difficult to understand from users with little domain knowledge. And this of course removes trust in the IBN system. That's why in this approach we want to generate this report in natural language.

Our pipeline includes three steps. We use AI for anomaly detection and prediction, XAI to extract the root cause of the anomaly, and then we use an LLM agent that will generate this report in natural language to explain the anomalies and then resolve them autonomously. This is the general pipeline of the intent assurance. In this demo, we tackle one use case of one application deployed at the infrastructure that contains some latency requirement. We are using XGBoost as the AI to predict latency violations, SHAP as the XAI to identify the root cause whether it's CPU resources or RAM or both, and then Llama 2 to explain the anomaly and resolve it without human intervention. Of course, we are training them using in-context learning as well. So we are using a generic LLM, which is Llama 2 in this case.

So let's move to the demo. We have an initial allocation of 250 milli CPUs and 250 MB of RAM for this application. We are using Grafana and Prometheus for the monitoring to visualize resources. We are using Apache Benchmark to stress the application with a load of HTTP requests and then we monitor CPU, RAM usage and limit and also the LLM's output. You can see here the Grafana dashboard of this application with the CPU and RAM usage and limits. So and you can see on the right side the allocation, the initial allocation. And then at this step, we use Apache Benchmark to stress the application with a lot of HTTP load, so it will consume more and more resources in terms of CPU and RAM.

At this step, you can see that on the bottom, XGBoost, which is the AI, predicted an SLA latency violation. So we give this prediction to the XAI and LLM. And here you can see the output or the report from the LLM based on XAI values of course. The LLM said that the root cause of the anomaly was insufficient CPU and RAM resources. And to mitigate this, you need to update the CPU and RAM resources to this new value. So we take these new values and we apply them autonomously to update the allocation of this application. You can see here that this new allocation was applied in both CPU and RAM. Then we do this again and you can see here that XGBoost predicted also a SLA violation, but this case, the root cause was only insufficient CPU resources and this is of course from the XAI output. And the solution was increasing CPU to this value. And then you can see here that the new allocation was applied without human intervention.

Okay, let's move to the conclusion. You can see here in this figure that this was happened in the demo, in the two demos. So this is from the first demo where the users deployed or activated an intent in natural language. And then we've seen how the intent assurance phase works. These are the stages of the IBN: intent profiling and reporting are done in natural language and then everything else was done using the LLM agents. So the advantage here is of course that instead of for the user to use the APIs and generate NSDs, they use natural language to manage the network and this is very simple and straightforward. You can see links here, we have papers and demo on YouTube if you are interested in more details. With that said, I'd like to thank you very much for your attention.

Jean-François: Thank you Abdelkader. And we have time for one quick question.

Speaker 1: Hi, thank you for the presentation. I have some questions related to your metrics and actually the security of your approach. For the metrics, have you considered the 78% availability that the models might deliver as a top because of the agents, right? They will have like the hallucination and they will be this tags of failure that for example, for network, we usually don't consider a good resolution when it's not close to 99% of time. So how are you tackling the hallucination, actually the problem solving of the security part onto, you know, the studies that you did for your intents and your prompt development? How are you tackling this?

Abdelkader Mekraouech: Yeah, so thanks a lot for the interesting question. Security was not the focus, but we are tackling this for example with some validation agents. So for example here in the executor agents, it will generate this NSD, and therefore when generating this NSD, we will validate it with a script using Python to validate the semantic correctness of this JSON and also the structure. And then also we are using human in the loop for now. You can see here a tool called human validation. So this human validation will validate the executor actions, the critical ones in terms of if the executor action decided to delete something, call the delete endpoints or put, so we are using humans in the loop here to validate.

Speaker 1: Okay, so you're not totally autonomous as it is on the drawing. I wanted to ask you, are you planning to be totally autonomous and how are you planning to tackle the autonomous play into, you know, being able to do anything that the agents might understand into the network, right?

Abdelkader Mekraouech: Yes, actually we did some evaluation in enabling this autonomous 100%. And the results was that for complex intent, like if the user introduced this complex intent that require lots of API calls or lots of reasoning from the LLM, then we have approximately 80% of accuracy. So we are not in the 99%. We have a research paper on it, it is called OSSGPT, and we published the results in this paper. But I think that here we need to use more advanced LLMs in the reasoning part so that they will not hallucinate. We also need to include some advanced validation agents, Python, to remove this human validation. So we did some evaluation here. I invite you to see the paper, it is called OSSGPT, and we published the results in this paper.

Speaker 1: Thank you.

Abdelkader Mekraouech: Thanks a lot.

Jean-François: Thank you. We'll have time for one quick question because then we, for sake of time, we'll have to move. So I'm sorry for other people in the queue. Just one question and then we move to next speaker. And of course you can reach Abdelkader offline and also through the mailing list if you want to share your thoughts, comments and questions. So just one question from Muhammad.

Muhammad: Hi, thank you Abdelkader for interesting talk. I was wondering when user provides an intent, how and which part of the network actually translates into something that machines can understand, or converts into JSON or something like that?

Abdelkader Mekraouech: So when the user will introduce the intent, it will be in natural language. So for example, "deploy something on our infrastructure" in natural language. This is an example, "deploy 5G communication service on the most available parts of the virtualized infrastructure." Then we use AI approach, LLMs to generate the low-level configurations from this natural language. So we are generating the JSON structure that the users—so we are generating this JSON structure to include some low-level configurations of this requested service. And this is generated by our approach, OSSGPT. And then we give this JSON to the OSS, which is our management framework, and the OSS will deploy this on the infrastructure.

Muhammad: Okay, no, thank you. Thank you. I can't see the slide deck somewhere. If you can provide it in the mailing list or in...

Jean-François: Yes, you have all materials yes, of course in the data tracker. So do not hesitate to have a look at it and of course contact you. I think there is a nice discussion but we have to move on. So I would just invite the—thank you again, Abdelkader. Invite the next speaker. The next speaker is Yunze Wei.

Yunze Wei: Hello everyone, I'm Yunze Wei from Tsinghua University and I'd like to share our recent updated work of framework and automation levels for AI-assisted network protocol testing. Let's first recall our background. We focused on the problem of network protocol testing, whose task is to test protocol implementations such as switches and routers to ensure the protocol conformance, performance, security, etc. However, traditional network protocol testing methods is mainly labor-intensive. Engineers first need to analyze the protocol specification such as RFC documents, design test cases and convert them into executable artifacts such as the scripts for the tester and the configuration for the device under test, and then execute the test in the testbed, and finally analyze the report. The drawbacks are obvious. It has a very low efficiency with limited coverage. It also struggles to adapt to the rapid evolution of network protocols in the new scenarios such as the industrial internet, satellite networks and some modern data center networks. So we introduce our draft which contains three main parts. The first part of draft is a framework of the AI-assisted protocol testing method, which has four sub-modules. That is protocol formalization, test case generation, test artifacts generation and execution and feedback module. The input of our full framework is the raw RFC document and the output includes some test cases, test scripts, DUT configurations and finally the test report. We also introduced an automation maturity levels for the protocol testing. This provides a reference model describing the evolution from the full manual testing method to the fully autonomous testing systems. We think that this serves as the technology roadmap for the protocol testing automation. Finally, we also provide a concrete and LLM-based example for the automated testing. Then I will show our updates in our new version.

The first key update in our new version is the framework. We change the protocol understanding part to protocol formalization in our new version. In version zero, we use protocol understanding which is a loosely defined extraction process, but we think that this is a very, very important part in the whole test case generation and the whole workflow of the framework. So we emphasize it as a protocol formalization, which is a more rigorous process that can translate the natural language described RFC documents into some machine-readable structured formats. This helps to promotes the test case coverage in the subsequent test steps. So we emphasized it in the figure. It is not included in the last version in the past version. Yes, and we also remove the human intent from the input because we think that human input may happen in every stage until it grew to the fully automated system.

And we also make some updates to our LLM-based examples. Firstly, we introduce more details of the protocol formalization process. We first make the pre-process to the RFC documents which divide the RFC into some different types of function modules. We divide them into two categories: the basic and the logic. The basic function modules include some message formats and local data structures, finite machines, etc. These modules themselves can generate some test cases but also can serve as the foundation for other logical modules. And the logical module can include some event action rules, algorithms and error handling, etc. We also capture both the internal and external relationships. For example, the internal relationship is the relationship between some modules and some sections, which section supports another section in an RFC, and we also capture the external relationships, for example, which RFC updates another RFC or it extends another RFC. Yes, I think this process provides a more process guidance for the LLM and it reduced ambiguity.

The second update to our example is that we changed the test case generation module to a test case templates and we decoupled the test case templates from the parameters in different steps. We first let LLM to generate a test case template without parameters and then it can generate either the parameter itself directly, such as the typical values or invalid values and the boundary values, or it thinks that the input should enumerate in order to improve the coverage, so it can generate some code snippets to generate some test input. It can also decide whether to generate some code snippets to calculate the test oracles. You know that some test oracles need to calculate to get the precise value to judge whether the test is passed or not. So I think this method is a hybrid approach that combines both the model-free method and the model-based method and it's just a test I think.

And finally, we also update the test code generation module from a single agent system to multi-agent system. We use a core agent to generate the test code with some sub-agents handle the auxiliary tasks such as the test intent rewriting and bug fixing.

Then is the next steps. Our draft is very comprehensive to some extent, so we will discuss we are also discussing which part is the key part of the draft that are worth standardization. We will refine these parts with more details in the future. We're also discussing how to evaluate the test cases. This is also a key challenge in the AI-assisted framework that based on our investigation because it can guide our system to produce better test cases with more coverage or something else. So maybe this can collaborate with BMWG I think. We will also make more industrial collaborations. This ensures that our proposed framework will meet the practical real-world needs in the future. Finally, we sincerely appreciate your valuable comments and feedback to our draft. Thank you very much.

Jean-François: Thank you. Time for questions, there is—yeah.

Speaker 2: Yes, thank you, thank you for your work. And we all know IETF has so many protocols, so this work is meaningful and useful in practice, but I am just worried that the content in the draft if they are suitable for standardization. So that's my concern. So I have a little suggestion that maybe you can focus on the part that is suitable for standardization, like the protocol formalization part, like the template, but not the testing method. I don't believe IETF will do some testing method standardization. So thank you, thank you very much.

Jean-François: Just maybe to comment on that, it's presenting in NMRG. So I think if I understand one part of your comment, you'd like to understand what will fit to research, so research group like NMRG, and what could be into the IETF. So I think it's—if I understood what you would like to see, what can fit standardization, so in IETF, and what will remain here in NMRG.

Speaker 2: Because he mentioned the standardization, so maybe you can consider where to put forward the draft.

Yunze Wei: Yes, and thank you very much. We will consider your suggestion. Thank you.

Speaker 3: Ken Chen from Huawei. I think it is a very interesting topic. So my question is that will this cope with the complicated scenarios? You know, the bugs emerges with multiple protocols involved. So in complicated scenarios, will this cope with the multiple protocols?

Yunze Wei: Thanks for your question. I think it's also a very important scenario. Our current focus is on some more static and single protocol scenario but we will also extend our work to more dynamic and multi-protocol with more with multiple devices scenario, more complex scenarios, but I think it's another work maybe. Yeah, thank you.

Jean-François: Okay, thank you very much. Next speaker, please. Is Yunyuan. Okay, Natalie, thank you.

Natalie: Good afternoon everyone. My name is Natalie Roman from Deutsche Telekom, and I am here to present the draft "Applicability of MCP for the Network Management" on behalf of the authors team.

As a short recap, we presented this draft for the first time in IETF 120 in the NMRG meeting and we also had a follow-up in the subsequent NMRG interim. We have here the GitHub link with which we track the progress of the draft and the corresponding issues. So now what is this draft about? I think most of us know about MCP and we're also quite aware about the rapid adoption across both startups and enterprises, mainly for use cases like AI coding assistants, database query, data analysis, and in general to provide AI agents access to tools. So what we're trying to do with our draft is to explore how MCP can be applied in the network management, and we're doing that by addressing different aspects that we list here. First one is the high-level challenges. Second is how MCP can be used for network exposure. The third one is how MCP discovery could look like, what are the deployment scenarios, and what are the architectural requirements.

Since the last version that we presented, we added significant context to the high-level challenges, the deployment scenarios, and the architectural requirements. So I'm going to go through those here.

Well, when it comes to adopting MCP for network management, we see challenges in two main areas. The first one is protocol design and the second are the security considerations. Regarding the protocol design, there are different problems. The first one is the lack of an entire error handling mechanism enforcement. MCP right now has some basic error codes but they are limited to discovery and invocation and they are not applied for example to an entire life cycle management. The second problem is that the protocol relies on server-sent state events. That means it is stateful and this makes complicated load balancing, and for remote servers network latency and instability can cause problems. And the last one is context handling, because whenever there are large lists of tools available on one MCP server or when there are multiple MCP calls, the amount of context and the amount of tokens consumed by this is big, and this can end in degradation of the LLM performance and degradation of the reasoning mechanism, which leads to even more ambiguity and uncertainty. For the second area, that was security consideration, of course we have the different attacks that malicious actors can execute, for example, prompt injection, tool poisoning, tool shadow. We also have the lack of a security enforcement mechanism that is inherent to the protocol. So far MCP relies on external implementations for authentication and authorization. And of course, we also need a well-defined identity management that distinct requests from users, from agents, or from sheer system.

The second part where we added some updates to the draft were the deployment considerations in the adoption of MCP. We described four different deployment scenarios, which are the standalone MCP server to expose APIs, the network controller and network element communication using MCP, network element intercommunication, and how the network controller can consume APIs or data sources from external services also using MCP.

Here's our first scenario, which is the standalone MCP server. In this scenario we have a network controller which has an integrated network management engine with its corresponding MCP client. And we define the MCP server to be deployed standalone on top of the network elements. Now here we have two options: either these network elements are regular normal network elements and the MCP server acts as an adapter between the MCP protocol and the standardized network configuration protocols like Netconf, or these network elements could be also evolved versions that can directly communicate to the MCP client.

In our second scenario, we define how the network controller can communicate to the network elements, but the difference with the first scenario is that here we describe the MCP server to be deeply integrated into each network element. That of course can consume some computational resources, so it has some drawbacks, but it is a scenario that can be considered.

Our third scenario goes a little bit more in the future and considers network elements that already have some intelligent capabilities. So here we foresee agents embedded into the network elements, probably powered by small language models, and inside each network element there should be an agent and an MCP client and an MCP server that enables the communication between both of them. Of course, this also enables the possibility that a human operator can directly use natural language to communicate directly with the network element.

And last, I think this figure is just self-explanatory, it's just how MCP can enable the communication between the network controller and external or third-party management systems. So in this scenario, again we have the MCP client in our network controller and MCP servers exposing third-party tools or data.

And the last update was regarding the key architectural requirements. Here we defined three main key points. The first one is that we believe that we need functional-specific MCP servers. This is to maintain appropriate architecture and appropriate performance when the amount of tools is growing. And therefore we think that the server should be categorized by network management function. For example, typical categories can include network log analysis, device configuration management, energy consumption, etc. The second key point that we defined in the draft is the secure and scalable architecture. Of course, we need to enforce strict access controls, limit MCP operations to authorized AI models and to scale efficiently. And finally, we defined that MCP implementation should support LLM-coordinated automation of real-time diagnosis, fault remediation workflows, and other common management operations to reduce operator workload.

That's it. Those are all the updates that we did to the last version of the draft. And now we would like to ask to the audience whether this work is interesting for NMRG and whether this draft is also ready for group adoption. Thank you.

Jean-François: Okay, thank you. Yes, we have questions, so please go ahead.

Speaker 4: This is Minje from Zhongguancun Lab. I have two questions regarding the deployment consideration in page four. In your second consideration, you propose to use the MCP for the communication between network controller and network element. So my question is how do you consider the relationship between MCP and existing protocols like SNMP and Yang model? The second question is about your third consideration. You propose to use the MCP for the communication between network elements, but in my understanding the MCP is used for the agents and tools or functions communication. Why do you consider to use the MCP in this scenario? That's my question. Thank you.

Natalie: Okay. For the first scenario, maybe I can use this slide. If I understood your question, this was how do we see the integration of normal network protocols like Yang and others. So this was something that I mentioned that here the MCP server can act as an adapter between the MCP protocol and regular configuration protocols like Netconf, like Yang models and so on. So this server could be the translator in a layer above the regular network element.

Speaker 4: Okay. Thank you.

Speaker 5: Hi, thank you for your presentation. And this is Haoran from Legis Lab. And according to your slide, the third page, you list some malicious actors like prompt injection, tool poisoning and shadowing as security contents. And I was wondering is there any network management specific threat model for MCP or you just plan to define in this field rather than only relying on the general large language model security discussion? Yes, thank you and that's my question.

Natalie: Thank you for the question. So for the moment, I think that the security threats that affect the general MCP protocol of course also affect to the network management. But as soon as we start experimenting with implementations, of course I think there is a possibility that there are other specific threats that only apply for the network management, but I think we need some more research for that.

Speaker 5: Okay. Thank you.

Jean-François: Thank you again. So we have to stop the questions now and you can continue offline or just after the meeting. Thank you again.

So the next speaker is online or is not here. I think it's online, right?

Shailesh: Yeah, am I audible?

Jean-François: We just prepare the slides. And I will give you the control. Yes, we can hear you.

Shailesh: Perfect. Yes, yes. Okay, good afternoon everyone. My name is Shailesh on behalf of the authors and contributors. I'd like to present this work on the applicability of agent to agent to the field of network management. So this work explores how multiple agents or A2A interfaces with network management, especially in the multi-domain and multi-vendor networks that rely on IETF technologies. So what I'll do is as part of this session, I will go through the A2A concepts and explain why this is important to the network management areas and also come up with our proposal of integrating the Yang-based structured data into the A2A communication.

So A2A is basically an open protocol for communication and collaboration between the AI agents. And so these AI agents exchange something called an A2A messages with each other and that's the single turn of communication between the agents. And each A2A message has multiple one or more parts to it, and each part is a container that has the actual content. So a part can either contain text or binary data or a URL or a structured JSON data. So conceptually, an agent can send a message that has multiple elements to that, including a natural language description or an artifact that is generated during the processing or it could also be a structured data that actually describes the task. So as you can see in the image here, so the network operator interfaces with the client agent here, and the client agent then communicates with the remote agents using the A2A messages, and they actually do the network operations including the configuration, troubleshooting and the monitoring part.

So within the telecom domain, there's some work happening at the TM Forum on A2AT, which is basically extending A2A to the telecom use cases. So A2AT is proposing something called the structured prompts for network operations. For example, in this fault diagnosis task can have structured sections like task descriptions, target objects, environmental information, constraints, expected outputs and so on. So these structured sections help better understanding and better interpretation by the multiple agents. So that improves the readability perspective and the validation perspective from the agents' point of view.

However, there is still a limitation to that because while A2A defines how agents talk to each other, it kind of lacks defining standardized semantics for the data exchange between them. Most of the A2A messaging is still largely dependent on the natural language descriptions and not the human or the machine readable forms. So that is where we propose as part of this draft a Yang-based structured data for agent communication. So the whole idea is to integrate the IETF Yang model data as part of the A2A message itself. So as we know, the Yang provides a well-defined hierarchical machine interpretable representation of network data. So integrating Yang with the A2A messages brings in several advantages to us. Firstly, it provides a very clear and machine parsable definition for network operations. It also provides seamless integration with the existing IETF technologies, for example, Restconf or Netconf or something like that. And more importantly, it brings in a human machine synergy because a part of the A2A communication message is the natural language description which is good for human readability and the second part of it is the Yang-based structured data which is good for machine readability. So that way it brings in very good human machine synergy into the aspect.

So here's one example which I wanted to showcase which is an A2AT message example for a network incident model. So as you can see the first part of it is the natural language for the human readability. It says something like, "please diagnose the service degeneration incident for optical service A in fan domain." So that's for the human readability part. The second part of it is actually the Yang-based structured data, which explicitly mentions the fields like what is the incident ID, what are the affected service instances, what is the domain, what is the priority and things like that. So I mean we have used the network incident Yang model that is defined in the NMOP working group. So this provides the flexibility and the clarity as to what needs to be validated by the machines, what needs to be interpreted by the machines. And as you can see in the diagram, the client agent picks up the operator's intent and then the fault management agent is the one that actually does the fault management operations, and it can use the tools and APIs via an MCP server to do so.

So this is one of the key updates to this version of the draft and there are several other updates that went in in the 02 version of the draft including the operator's operational considerations for integrating A2A into the network management scenario. So what I'd request the research group is to go through the draft and we very much welcome the feedback and any comments that you may have. And I also wanted to check with the chairs if the research group NMRG is interested in a draft like this. So that's all from my side. Thank you.

Jean-François: Thank you. Thank you. One question.

Speaker 6: Thank you chair. Again Xing from CAICT. One question. And the advantages of A2A communication maybe they are more flexible and they can support to communicate with each other through the natural language and not only the not only the information defined in Yang model. So if we ask the agents to communicate with each other which must have the Yang model information, I'm worried what if we want to we want to interact with each other about the things that are not defined in Yang?

Shailesh: Right. So that's a good question. So basically as part of this draft, there is also a part of the draft that says that each agent exposes something called an agent card, which exposes its capabilities and what are its configurations, what can it do, what it can't do. So as part of that, if the agent card says that this particular agent has the capability of parsing Yang or enriching Yang as a structured data, then we delegate that into that agent. So not all agents may be capable of that. So based on the agent capabilities, we choose to delegate the network operations and functions or the Yang-based operations to specific Yang to specific agents based on their capabilities. So that's part of that draft.

Speaker 6: Okay that sounds reasonable. Okay, I will email you if I have any further thoughts. Thank you. Thank you for your work.

Jean-François: Thank you very much. So we have to move to the last presentation. Thank you, Shailesh.

Shailesh: Thank you.

Chan-Ho Jung: Hello everyone, this is Chan-Ho Jung. I'm editor for Intent-Based Networking Use Cases draft. Next slide please.

Thank you. So I have updated this draft from the Jerome's valuable comment, so I believe I addressed all comments. Okay? Okay this one is the difference between version 2 version 3 table of contents. The major updates are four. The first one is usage of IBN methodology. So which means methodology we have IBN construction, how to apply like intent translation, intent verification, something like that. So section 3.5.9 so I made a table, so we have nine use cases. So construction number means from intent translation, intent verification, so we have eight steps. So what kind of step are used for each use case.

Second one is intent classification RFC having intent taxonomy. So I addressed taxonomy. Like left-hand side we have intent taxonomy, given intent. It consists of seven kinds of intent components such as intent solution, intent user type, etc. So right-hand side we have the table each use case what kind of intent taxonomy component used, so I analyzed.

And the third one is enhancement of section. So we try to make all use cases text have the same level. So that is okay last one. Okay this one. So balanced and consistent explanation among use cases. So especially section 3.2, IBN for guarantee service level agreement is quite large text but I shrink and then try to make consistent with other use cases. So you can see I explained this traffic monitoring system according to the intent lifecycle system diagram.

And then the lastly, so enhancement of practice learning section. So the previously we have practical learning for service function chaining SFC. This version I included cloud-based security system with I2NSF. I2NSF stands for interface to network security function. This one is for cloud and edge security service. So I was the editor for these interfaces, so I applied. That's all. Okay.

Okay I see. So finally, so we addressed all comments of our NMRG chairmen, German, and we believe this draft is has good shape. I please read this draft and give us your valuable comment. Hopefully we can address all your valuable comment by next IETF meeting. And then I would like to ask last call in IETF Vienna meeting. Thank you. That's all. Any question, comment?

Jean-François: Thank you. So question, comments? Also I sent email to our group so please respond to my email. So thank you very much for addressing my comments not as chairs but also as participant to the group. I really appreciate and yeah as you proposed I think the plan you proposed is good so. And yeah it will be good that if people can give feedback. And so of course there will be then let's say a kind of formal last call but before the last call we would like to also and I think for the author to have the feedback so that you can really improve the draft if needed before we go for this last call.

Chan-Ho Jung: Okay. Thank you. Appreciate.

Jean-François: Thank you. Okay thank you.

So thank you very much to the all the presenters. Thank you very much for attending, participating. And this will end the session today. So finally with two minutes in advance. Yeah. So and I hope to see you around of course and of course on Thursday session. Thank you very much and have a nice evening.

Session Date/Time: 19 Mar 2026 08:30

Jefferson Campos Nobre: Good afternoon. This is the NMRG session, the second session of the week. My name is Jefferson Campos Nobre and I'm a co-chair of the research group along with Jerome Francois. And the secretaries are Pedro Martinez-Julia and Cao-Wei Cee.

As you may know, the IRTF also follows the IETF IPR. So by participating in the IRTF, you agree to follow the IRTF process and policies, and you have more information on that on the RFC 5743.

Besides that, also considering the Note Well, we have audio and video recordings of the meetings. So if you prefer not to be photographed, you will need to put that on your badge. Otherwise, if you speak at the microphone or appear on the meeting, you consent to appearing in recordings of you at that time. Also, if you are participating on the meeting online and you turn on your camera and your microphone, you are consenting to appearing in such recordings.

Again, in the IRTF, we have the privacy and Code of Conduct. So as a participant or attendee to any IRTF activity, you acknowledge that written, audio, video, and photography recordings of the meetings may be made public. So you can find the information on the privacy policy of the IRTF online and also concerning the RFC 7154 and 7776, which also apply to IRTF.

Okay. As a research group, the NMRG is part of the IRTF and in this organization, the focus is on long-term research issues related to the internet. So it's a parallel organization to the IETF. So it's good to say that it's not a standard development organization, but we conduct research.

So in this context, while the IRTF publishes informational or experimental documents, our primary goal is to promote development of research collaboration and teamwork, considering, of course, internet technologies. Also, you can have this information on the RFC 7418.

Some useful links for the meeting with materials, the MeetEcho, and also I would like to ask you to use the onsite tool in order to be present virtually on the meeting so we can have, you know, an accurate count of participants. Also, we have some notes, so everybody can check that and also help with the notes. And also, the video recording of this meeting will be published online on YouTube.

Okay. So this is the agenda. So we have a really packed agenda for this second slot. So I would like to ask for the presenters to mind that the time considers the presentation and also the Q&A. So please be respectful to the time defined for pre-presentation. So we'll start with some quick introduction to IDN, then some presentations on network digital twins, and finally, we have some presentations on AI.

That's it. Also, on Saturday, we will have an interim meeting with ETSI ZSM. So this meeting will be hosted in this hotel and we will discuss several topics considering which are of interest of the NMRG and the ETSI ZSM, but especially those related to Agentic AI and the relationship with network management. So you are all invited for that and if you can, if it's possible for you to attend the meeting, please send email for the NMRG chairs, for me and Jerome, in order just to have the count of the participants onsite. It's important also to say that remote participation won't be available for this meeting. So again, if you can make it on Saturday, we'll be happy to have you on this joint meeting.

So the first presenter will be Hanling Wang. Okay. You can go.

Hanling Wang: Thank you. Good afternoon, ladies and gentlemen. My name is Hanling Wang from Pengcheng Laboratory and today, I'm going to give two short presentations of two drafts that we submitted recently. And actually, this is my first time to submit an IETF or IRTF draft. And these two drafts were initially submitted to the CATS working group, but later we found that these two frameworks might be better suited for the NMRG working group, so I present it here. Okay.

My first draft is about Intelligence Distribution Network (IDN). Okay, some background. We think that currently the network is undergoing a new paradigm shift. The core goal of internet is shifting from establishing connections, distributing content, towards providing intelligence services for users, especially from 2023, the prosperous of the large language model. And currently, the main target of the internet has become to provide the intelligence service including services like AI agents, chatbots, generative AI, etc.

But we think the current bottleneck of current AI inference framework is that the demand for intelligence originates from the network's edge, for example, the cars, the phones, the laptops, while computing resources are concentrated in cloud data centers. The current paradigm of data transmission via network plus centralized cloud inference leads to poor user experience, high network pressure, underutilized computing resources, and high security risks.

So, in this framework, we want to enable computing power to flow like water and electricity. This is our ultimate goal. The user of computing power becomes plug-and-play and elastically supplied through unified scheduling. But actually, the computing power cannot flow.

So our idea is that the Content Delivery Network (CDN) enables content to flow like water and electricity. So how can the next-generation network architecture enable computing power to flow in the same way?

So our key insight is that computing resources cannot flow, but actually intelligence can. So by building a hierarchical intelligent computing interconnection architecture, which is our IDN, we can cache popular intelligence services. And similar to CDNs, these intelligent service capabilities can be pushed into distributed computing networks.

But there are several key challenges, because AI models have complex structures and cannot be arbitrarily decomposed. They require substantial computing resources, making deployment difficult. Intelligence service requests are massive, dynamic, and heterogeneous, which makes it challenging to guarantee high quality.

So in this framework, we proposed in totally six components. The first is about identification. It solves the problem of how to decompose the large models into structured objects characterized by service type, capability boundaries, and required resources. And the second component is about access. It solves the problem that how each computing device can join the computing resource pool to provide their own computing capability.

And the third component is about deployment. It solves the problem that which model shall be deployed on which device. And the next is about routing. It solves the problem that given a user request, how shall we handle this request, in which node shall we route this request to, just like in the CDNs.

And the fifth component is about caching. It aims to improving the performance optimization by caching the data and the intermediates of the models to improve user experience. And the last is about security. We also want to ensure the data privacy and the model security of the framework. So actually, that's all about the first draft. Now let's move on to the second slide.

Okay, thank you. This is my second one. So actually, the second one is a bit related to the first one, but it's not totally the same. The second is about an Open, Decentralized, and Scalable Framework for Large Language Model Inference. Actually, this draft is inspired by blockchain or Bitcoin.

I think I can skip the background because it's the same as the previous draft. So we think that the centralized inference is not enough because the centralized inference frameworks always assume that the inference structure is high bandwidth and low latency, and it contains homogeneous GPUs and tightly controlled scheduling, and a trusted execution environment.

And actually, this is the optimal solution for low latency and high throughput, but accuracy alone is not the whole problem. In inference, we also want to deal with elasticity, peak cost, cross-organization and trust-constrained inference, computer ownership and access, as well as geography and tail latency. So we believe that the centralized inference is not a one-size-fits-all solution.

So what is expected for LLM inference? We want more flexible cost structure, active computational participants, decentralized computer access, and higher execution elasticity. So we believe that the distributed LLM inference is the future paradigm, but actually this is not that easy.

There are several key challenges. The first one is that the distributed inference is strictly harder because it has the LLM inference has strict layer-wise dependencies, and it has millisecond scale per-layer deadlines, heterogeneous, unreliable, and untrusted nodes within a distributed environment. And there is no, we want that there is no global scheduler or trusted coordinator. So how can the inference remain correct, timely, and alive under these constraints?

The first core challenge is about the activation delivery problem, where we need to transmit the intermediate or we called activation between the layers among different computing nodes. So why existing protocols are not efficient? Because the TCP/QUIC protocols has no notion of deadlines or dependencies. While RPC frameworks retries overlay violate inference timing. And generic P2P overlays ignore compute and latency constraints and collapse under churn.

The second challenge is about security and incentives, because the nodes in this distributed framework can be owned by different entities. And we cannot assume correct execution. And detection alone does not ensure honest behavior. So inspired by the Bitcoin and blockchain systems, we use the ideas of cryptographic identities, verifiable actions, costly misbehaviors, and economic incentives to build this system.

So in this system, there are mainly three components. The first one is about a layer-aware transport protocol, because LLM inference is sequential, latency sensitive, and data-driven. The transport protocol understands model layer boundaries, execution order, and per-layer deadlines.

The second is about a coordination protocol for heterogeneous peers. Because peers differ in compute speed, memory capacity, network latency, and availability, the coordination must be decentralized, adaptive, and predictive.

The third is about an economic protocol for rational participants. Because nodes are not altruistic, compute and bandwidth have real costs. The economic protocol defines who is allowed to execute inference and how rewards are distributed. So that's all about the short presentation about these two drafts. We welcome any feedback from the community. Thanks.

Jefferson Campos Nobre: Questions? So I have one, then you, Jerome. Thank you for the presentation, Hanling. One question is, since this document was first submitted to CATS, my question is how do you see this document fitting into NMRG?

Hanling Wang: Okay. Thanks for your question. Actually, I think these two drafts are about frameworks. Actually, this framework can contain different components. And in my presentation on Monday at the CATS working group, they also suggest that because this is a larger framework, and the CATS working group are more about the routing side, and I think the NMRG working group is more about how the network can be organized and managed in current like, for example, intelligence services. So I think this whole framework may be better suited to the NMRG group, and especially for some components in this draft, for example, how the models can be distributed in a similar fashion to CDNs in the network, I think these components might be interesting. But I'm also willing to hear the chair's suggestions. Thanks.

Jefferson Campos Nobre: No, but speaking as an individual, I think that for me at least, when the documents and the presentations are made to the NMRG, I think that one important thing is to think about the research, which is the research question, the research initiative that you're trying to perform.

So I believe that at least at this point, the draft and as long as your presentation, they are more driven towards something that it's not research per se. So maybe something that you can account if you want to move along with the draft in NMRG. So maybe to think about that, I believe it to be a good idea.

Hanling Wang: Okay, I think we'll, maybe we can discuss about that later. Thank you.

Jefferson Campos Nobre: Okay, thank you, Hanling. So we're going to the next presentation. So the next presentation will be performed by Marco.

Marco: Hello. Can I proceed? Okay, of course. Yeah. Alright. So network digital twins. Actually, we were not crystal clear about how much network digital twin is still on the agenda of NMRG. Nevertheless, we saw there is another slot today, which is good. And the group worked hard on an architecture document. Some time ago, we published a complementary document mainly focusing on challenges in developing, deploying network digital twins to foster discussion. And we got a good amount of feedback and discussion based on that.

So the draft expired, but nevertheless, we wanted to take the opportunity in a few minutes to brief you on our hands-on experiments and experience here. Some of these results have been published at IEEE conference referenced here, which will be published in May. So if there is more interest, contact me or have a look to the paper.

So quick recap on the ID. So it was meant to complement the group's NDT architecture ID and we sketched a little bit of an extended reference architecture to perform analysis, description of challenges, looked at different consumer expectations, architecture variants, and associated operations.

So we tried to look and analyze on some NDT principles, best current practices, and practical aspects mainly derived from hands-on experience in three use cases. Some of them have been developed, some of them have been just sketched and experimented with. So that covered an SDN kind of network twin, a QKD twin which has mainly been done by one of the co-authors at the university from Martin, and we focused on the network twin of 5G mobile systems. So particularly on that case, there was a lot of interest and discussion and questions. So let's take the opportunity to let you know what we did here.

So from the draft, how we sketched how a network twin could be useful. So we compared a little bit about offline network twins, real-time network twins. Mainly the use case is to comprise a virtual replica of the physical system, to do experiments or to link them and to have real-time analysis and prediction of analysis or events which may be useful for well, failure handling, whatever.

So the model shown here is pretty abstract. So it has a 5G system in the middle with models of different network functions from 5G, has RAN emulator to generate traffic and increase pressure to the network. And what we are interested in is getting output from the model in terms of prediction regarding operation sequences, but CPU utilization, memory utilization.

So based on that, we did some experiments and developed a prototype. So different from open source, we used a production-grade 5G core, which is very different because a single 5G network function is not made out of one function, but various microservice functions which are deployed in containers. There's dependency in between them. And what we wanted to study is in particular something called cascade failure.

So if you have one network function or one microservice function which is overloaded or fails, it has a dependency on another one, which makes it stalling, failing, whatever. So that's what we wanted to study to prevent actual huge failures and counteract in advance and not reactively.

So we deployed this production-grade 5G core in AWS, Kubernetes services. We made use of AWS S3 for as a storage to handle a huge amount of logs that the system generates from Kubernetes point of view, from the 5G core point of view. So we collected the data and we developed an ingestion pipeline, mainly to process data from different sources, normalize them, because the data being generated is very diverse, right?

So we published them through an Influx database to the network digital twin as we show in the next slide. And for visualization, we use common tools like Grafana dashboard. So here we could directly investigate CPU, memory utilization, network KPIs, and associated prediction which we achieved with the Prophet model.

The predictive engine, so we tried different AI machine learning models here, but we're very happy with the Transformer model because of its characteristics of attention, etc. So we focused on first looking at a forecast of pod-level CPU utilization and alert likelihood.

So this is the system that we developed. Not all of that has been developed, but that's a let's say closed-loop kind of architecture that we had in mind. Most of that has been developed and deployed. So as you see here, there's the 5G core functions being deployed in AWS. Whatever is needed to really operate the network. To have really stress on the 5G core, we use the proprietary emulator on the user equipment and RAN side to really put pressure on the 5G core.

And as you see here, so there is a lot of data being generated from Prometheus, from the 5G core in terms of logs, in terms of events, in terms of alerts. So all of this diverse information we push towards a preprocessing pipeline to extract the data, to collect it, and to normalize it and then publish in Influx database.

So the point here is, in the end, we made use of 60 gigabyte of data and having kind of 500 plus different features that we used to train the machinery. As you see, Grafana makes use of this data to directly look into particular attribute values and the machine learning kind of models give us the predicted numbers, which can be used by Grafana to analyze or directly fed into the policymaker. So for a closed-loop automated system, the policymaker is made to instead of reactively fix a problem, but to change configuration and proactively avoid the problem before it happens.

So last but not least, very short here. So in the paper, you find more information, pretty much details on which settings we chose to train the Transformer model. On the left side, sorry it's a bit small, but it shows the complexity in how one network function, so we chose the Access and Mobility Management Function of the 5G core here, it's an AMF, which communicates with other functions, SMF, UDM, but also here the blue bubbles are some of the microservice functions that make up the AMF.

So we have a dependency here and we actually trained the model with many, many features, not only associated with that particular pod operating one particular function of the AMF, but with all of them. So that gave us pretty good results as you see on the right side. So the typical models with epochs and the accuracy that you achieve in the view of training versus validation.

So in the end, we used 50 kind of epochs to get good accuracy. And one of the diagrams we want to show is what you see here on the right-hand bottom, which is a blue line gives you the real data and we had an overlay print of the predicted data. So depending on the window size that you choose, how much data you want to predict, you get a pretty good and accurate kind of figures here. A little bit smoother compared to the realistic data, but we were very happy with these results.

And so this was only prediction of the expected CPU utilization for one microservice function of the AMF. So it's a modular system. You can deploy models for other microservice functions and other network functions of the 5G core to get the same results. And the point is that this model allows you to really get the correlation between the load figures, metrics of the different network functions, and not only of one. So this correlation figures is very important to take into account to get these kind of accurate prediction results.

Alright, that's actually it. So this is a starting point we want to further investigate various aspects here. It's a big topic and as we still believe network twins, there's no one single overall solution, but this is one chosen particular approach that we wanted to investigate with real production-grade software. So maybe we have 40 minutes for remaining question, otherwise, I'm available for the rest of the week. Thank you.

Jefferson Campos Nobre: Thank you, Marco. Questions? Okay. Chang, you can go.

Chang: Hello. Chan from China Mobile. Here quick question on your practice of NDT on 5G core, right? So what's the difference between offline NDT and you mentioned the real-time NDT on perspective of time cost or decision latency?

Marco: Well, the difference from the operation point is offline is so you train a model to have a good replica that behaves like the physical system, but you don't have it linked. So no real-time synchronization of physical system status with a replica. But you can set the replica into a state but then do experiments like so what happens if, right? What happens if I increase the load here or there? So what is the result? So this is for analysis.

For a closed-loop system, you probably prefer kind of real-time system, which means you link the network digital twin with the physical system and have continuous alignment of the data of the twin and the status of the twin compared to the physical system. And this is can be used to predict certain situations which are critical and then perform via the O&M system or orchestrator some reconfiguration to avoid a failure before it happens.

Chang: Okay, a follow-up quick question is that the proposed Transformer-based machine learning or AI method, right? Is that method can meet your real-time requirement in most time?

Marco: When the model has been trained, which is of course done offline, it was pretty fast, yes. So training, we made use of a big server with GPUs. And for the operation, it was fast enough, yeah.

Chang: Okay, thank you.

Marco: Sure, you're welcome.

Jefferson Campos Nobre: Thank you, Marco. So the next presenter will be Chin.

Chin: Thanks, chair. So good afternoon. So the topic I'm going to share today is Network Digital Twin and Agentic AI-Based Architecture for Network Operation. So Michael, you know, more focus on pure network digital twin. So what we want to do is build on top of network digital twin, we can use agentic AI to, you know, to support fully autonomy and realize this intent-based network.

And so Agentic AI is a hot topic actually. It has been involved in the same way as autonomous networking level. We know autonomous networking level is the methodology developed by TMF Forum, you know, used to measure how we can move from manually operation to fully autonomy.

And with the introduction of this agentic AI, you can see, you know, it has migrated from large language model to large reasoning model and large action model. The key capability introduced, for example, reasoning actor, you know, to support you know close loop, you know problem solving, multi-stair procedure. And also with, you know, large action model actually it supports self-managing capability, for example, self-planning. So with all these capability actually it really drive, you know, moving from helper operating to moving from level three to level four. So we can see, you know, this level four break down into phase one, single domain autonomy, focus on single scenario autonomy to phase two end-to-end autonomy.

So this background. So we I think in IETF talk a lot about internal agent. Actually here we want to focus on network management agent. So this refers to, you know, definition from TMF Forum. They give a definition key characteristic of autonomous network agent. So for IP network domain, they more focus on, you know, fixed application scenario such as fault management, network optimization, network change management. And you can see, you know, some other, you know, high-value scenario developed by TMF Forum.

And for autonomous agent actually, the key principle they they follow is single domain autonomy and cross-domain collaboration. This actually, you know, more, you know, match with intent-based networking concept. This can be used to realize intent and combined with these, you know, in addition they introduce the multi-task collaboration between network management agent. So this can support proactive or reactive model and and also we can see these task actually involve some ambiguity uncertainty.

And for autonomous agent actually, they need to communicate to address this kind of, you know, challenges. They need to ensure the accuracy and efficiency of the structured data for determinists task. In addition actually, it will, you know, support natural language interaction. And yeah, in addition also we don't want to, you know, bring a radical change to the existing protocol. They need to compatible to the existing legacy network management protocol. This will help you to seamlessly integrate.

So this key challenge for Agentic AI on IP network management. And you can see this Agentic AI for stack, you know, from basic level to protocol level and you can see we have MCAP and so they will consume various different, you know, network service API. So they can expose to the upper layer so we can have agent level. So they can use A2A or some other agent communication protocol to, you know, to communicate.

So key challenge, you know, here we list top three challenge. And for example, first it lack, you know, domain-specific language which can be better consumed by AI agent or large language model. For example, it lack semantics context. Therefore, AI is hard to do the better reasoning about the intent. And and also the data quality and is crucial. And today actually many cases actually data build the silo. You need to generate data from, you know, various different data source or various different type of device. And also we really need to build, you know, bridge between the, you know, data engineer community and network engineer community. And so this need to, you know, have professional with the cross-discipline expertise.

Second challenge related to the trust security. For example, this can be broken down into three category: how to interact with human, how to interact with tools API, how to interact with AI agent. And I I think take the interaction with the human as example. The, you know, agent may be consist to, you know, manipulate human to do the covert worker. And and also human may be, you know, transfer or delegate some authority to agent. Agent may be overauthorized. I don't want to, you know, go to the detail about this security part.

For the third challenge related to the protocol among the agent. What kind of protocol can be, you know, used to, you know, between the agent or tools and model? And in the network management field, I think, you know, this actually already be, you know, highlighted by the TMF Forum spec actually. For example, high-risk operation, one example you can do the bunker network change, this maybe bunker deletion, this kind of high-risk operation may lead to the large-scale outage.

And also timeliness requirements is also a crucial, for example you for for the agent communication protocol today actually it's hard to support, you know, pub-sub based mechanism or event-driven mechanism. And also for the collaboration reliability is also a big issue.

So here is example in the for the, you know, Agentic AI architecture. You can see we can break down into network element agent level, network agent level, and service agent level. And network element agent level, we we can see the today's network element has already moving from traditional network element to the more modernized network element may, you know, embed some, you know, AI capability. We call them lightweight AI. And they may also support some, you know, more sophisticated, you know, capability, you know, to help you to provide better awareness or perceive some, you know, security risk or to identify some, you know, application traffic. So this kind of capability.

And on network agent level actually, it comprise multi-agent system, network digital twin and agent gateway. So network digital twin actually provide, you know, data foundation, you know. They can, you know, provide data to the multi-agent system they can better consume this kind of data. For agent gateway they can provide, you know, unified registration, security, and observability. And so at the service agent level, we can have service center related agent or operation related center.

We allow these service-level agent to collaborate with network-level agent, so we can support cross-level collaboration or cross-domain collaboration, you know, between these agent. And in the right you also see different level agent can go wrong actually. How we can make sure they do the right thing? So we can need human-on-the-loop. So you can make sure, you know, there's some unexpected situation happen actually so human can get engaged to, you know, to intervene.

So here's the whole solution look like actually I just want to emphasize this agent not, you know, internet of agent is network management agent. We allow network managing agent using the agent fabric. So this agent fabric will, you know, build the, you know, connectivity between the between the these network managing agent or agent gateway and so they can register the the agent to the agent registration center and also with their, you know, capability and tools resource and in addition we can, you know, introduce the trust data gateway provide, you know, security trust and for example human-on-the-loop I in at the beginning I mentioned.

So this when agent, you know, execute the operation, we need human operator to review the decision to intervene only when for this exceptional high high-risk situation. And also we can use this trust data gateway to provide, you know, privacy, you know. You can, you know, support the runtime anonymization, this kind of feature to, you know, really protect, you know, these privacy. And observability is also a important issue. This also related to human-on-the-loop. So you can, you know, make agent to declare audit level autonomous flags. So you can better trace, you know, these agent whether they go wrong and where they, you know, make a mistake.

So and we also give a, you know, two related scenario and, you know, we call them domain-specific autonomy. And they actually can be broken down into two cases. One is we at each level, for example network level, we only have a single agent. For example, we have IP network domain AI agent, so this will take input from service AI agent. So they can, you know, set the goal and break down this goal into several task and so they can, you know, interact with MCAP server understand what kind of, you know, tools or resource they can use, so mapping the task to each, you know, the the tools functionality so they can invoke MCAP to fulfill the task.

In the right side we can also at the same level can, you know, allow multi-agent collaboration. This here is still network managing agent. So very similar to the, you know, use case one actually, we we don't use, you know, MCAP server, but we, you know, can see a you can see this like network controller function network operation, you know, power of the network controller can be, you know, refactored as, you know, task agent actually. So this you can see this there's two level of agent. So the IP domain AI agent can be see as the command agent, the, you know, task agent can be see as expert's agent actually. So so they can, you know, interact with each other.

So this actually another cases and you we really want to provide service assurance, you know, to deliver IP private service. So you can, you know, use this kind of for example for IP network domain actually they can consider to interact with not just, you know, quality optimization agent but also the fault agent. But they will distinguish which situation they and to choose which, you know, task agent. For example, in case of service degradation, they will, you know, only select quality optimization to, you know, to to fix this issue or to build the close loop.

In case of fault scenario, they will, you know, call the fault managing agent. And we also see, you know, network digital twin platform can connect all the data from network element. They can use traditional protocol, you know, telemetry open telemetry and feedback to the task agent and also domain agent. So domain agent can use this, you know, to do the service verification. So in this way can provide, you know, the the service assurance.

With that actually I conclude my presentation. Actually we do have a side meeting on Tuesday afternoon, actually bring all the many participants to get together to explore, you know, how agent AI can be work for the IP network operation. We get a lot of good comments actually we see people common interested single agent reliability and agent benchmarking or agent test is a, you know, common issue. And also how to, you know, develop semantics can be better consumed by the AI or large language model, you know, to to have large language model friendly semantics or agent friendly semantics become very important. Yeah, with that I conclude my presentation. Thanks, any comments?

Jefferson Campos Nobre: Thanks, Chin. I think we'll have time for just one question. Uh-huh.

Speaker 1: Hello, thank you, Chin, for the presentation. Actually, I have a couple of comments or questions. Maybe something is just in we have been in the community network management having studied in the last 20 years several versions of autonomous networking with agents, without agents, different flavors and even in IETF and IRTF, there have been groups and attempts on that. So I would think it could be useful at least to look because you seem to propose a bit of a new baseline for an architecture to not be too much forgetting what's happened in the past and to look, okay, what can we build also on lessons learned from what wasn't working before? Okay, and contextualizing with the new the new technology on Agentic AI.

The other thing is that you seem to base a lot of your work on, of course, agents, Agentic agents. Uh, I mean, AI agents. What I don't see in your proposal is actually everything related to governing the the the the collective of agents in the context of network management. Because in one of your figures, you have mapping of different yeah, this one for instance. You have multiple agents, but the agents are kind of function or task specific.

And I don't see in terms of management or governance, how do I coordinate that they have a global behavior that if they have emerging goals, how do I manage that? Uh, the deployment aspects, the uh some aspect related to trust domains, so I think they there could be a span of different like an orthogonal dimensions in terms of governing the set of agents for the purpose of managing a networks. And I think this is probably something that is lacking because if I look at this, I don't know how I am going to deploy it and operate it from an agent perspective. It's it's a it's a management architecture, as I see it, with the with the task, but I don't see why having agent I I don't know how to manage my agent with doing that. So this could be something to think about.

Chin: Too good comments actually. For the first comment please let us know relevant reference. I also can do the homework and and we will, you know, try to, you know, add more this kind of motivation in the abstract to address your comment. Second, I think this just a logical architecture. Yeah, you are right. Actually we really need to expand a little bit to cover more. Yeah, happy to collaborate with you if you are interested. Thank you.

Jefferson Campos Nobre: Okay. Thanks, Chin. Yeah. So we'll start the the time allocated for the AI presentation. So the first one it'll be Towards Intelligent Network Configuration Management with LLMs-Wenlong Ding.

Shuguang: Hello, hello everyone. I'm Shuguang from Zhongguancun Lab. Today, I will introduce our work, a framework to evaluate LLM agents for network configuration. We all know that more and more researcher are using LLM agents to config the network. However, existing evaluation methods are too simplistic to reflect the complexity and diversity of real network environments. So we would say that the community lacks a standardized evaluation benchmark for those agents. So we propose a comprehensive framework for those agents.

Here is the overview of our framework. It has four component: the task dataset, the emulator environment, the AI agent, and the evaluator. The first component is task dataset. It's a repository of task. Each of them each is a JSON object and is include six part: the intent, the topology, initial config, ground truth config, reasoning chains, and test cases.

The second one is emulator environment. We use the GNS3 to build a environment with real vendor device images. It provide four APIs to the agent and two APIs to the evaluator. The third one is AI agent. Here is referred to the agent to be evaluated. And the last one is the evaluator. It computes three scores for each task. I will introduce in the later slides.

So the workflow has six step. First, the task dataset gives a task definition to the agent, environment, and evaluator. Step two, the environment set up the net topology and initial config. Step three, the agent interact with the environment. Step four, the agent provide his reasoning process to the evaluator. Step five, the environment gives final configs to the evaluator. And final step, the evaluator calculates the metrics score by the run by running tests.

So we have three key changes in draft 01. First, we add a new section MCP-based implementation. Two, we refine some more details on environment evaluation metrics. Three, we add some terminology and description updates.

For the new section, we add the MCP protocol as a standardized interaction interface. It means the AI agent acts as the MCP client, and the network environment environment acts as a MCP service. It provide a unified unified way that works across the different agents and environments and tools. And it's easy to add new tools and environment change changing the agents, without changing the agents.

For the evaluation metrics, they are three scores. First is the reasoning score. It measures if the agent reasoning aligns with expert annotated ground truth. We use the cosine similarity to get this score. The second score is the command score. It measures if agent generate the correct configuration commands. We use the F1 score to compare the applied commands against the ground truth command.

The third score is test cases. It measures if the configuration actually works in the network. We running test cases in the emulator and take the pass rate test as the score. So those three metrics covers the reasoning quality, command accuracy, and functional correctness.

For the future work, we want to support multi-agent collaboration via H2A protocols to handle complex configuration tasks. And we want to increase task complexity with large topologies and cross-domain configurations. And also want to use agent skills to define the evaluation workflow for simpler implementation and more flexible benchmarking. Thank you for your listening.

Jefferson Campos Nobre: Thank you, Shuguang.

Speaker 1: Thank you for your presentation. I have also couple of comment and question. Um, the first one is that I I think I also speaking as individual, I really like to see this type of work, um, trying to have some method to evaluate and benchmark as we are all jumping on the agentic AI and AI. So I think this is really worth to do it. Thanks.

But I have actually one question and one more comment. The first question is why do you think what you propose, for example, for evaluation is specific to agentic? We could use that for any probably solution for doing network configuration. We don't need to we could evaluate we could use your framework to evaluate not only agentic AI solution, I will say.

And the second comment, I think it's one big task as you know is this dataset construction. And I think it would be good a bit to rather than only saying that we have this dataset format and we extend it, is to look at other datasets that exist. That's why in the in the in the I think although with the first session, we have one speaker that also work on dataset for for this type of work. We have also speakers in Montreal that also have proposed datasets. So at some point, I think it would be good that we try together to build larger at least not a single dataset, but a set of datasets, but to be a bit aligned on what we would like to see in dataset and so on.

Shuguang: Thank you for your suggestion. Indeed, the dataset currently is very small. It has no enough enough test cases. And we want to share the task task format so many people can submit some new test cases and make the benchmark work better.

Jefferson Campos Nobre: We have time for a quick question. Loreen, if you may?

Loreen: Again, maybe two point. I think um I suggest you to contribute to the GSMA Open Telco benchmark, because what you do, they have several different use cases and I think what you propose could be very well appreciated in this community.

My second question is more about um in the configuration that you try to test, is it like you have only I mean, you pose a problem to the agent and you want it to output a configuration and you compare to a ground truth configuration that could come from someone that knows what the configuration should look like.

So my question is in the problem that you pose, is it like there could be only a single configuration that match, or there could be variations, or it could be different steps into how the configuration could look like? So what is the variety of the answer that you expect? Is it like there is only a single answer that can validate, or there could be variations? Because then it can change also the evaluation.

Shuguang: It's a good question. Indeed, maybe the config make the network works works better is not not only the only way to a one set of correct commands. There may be multiple way to achieve the same effects. We will research this problem and revision over draft and the research paper in the future. Okay.

Jefferson Campos Nobre: Okay, thanks Shuguang. We can always move the this discussion to the the mail list or the chat. So thank you. Next presentation. Framework and Automation Levels for AI-Assisted Network Protocol Testing.

Chin: Okay. Good afternoon everyone. I present this short draft on behalf of Dr. Guorui Xie because he is on an urgent business trip. Okay. So the title of this draft is about INIP, in-network inference protocol.

The limitations of traditional AI deployment is mainly in two sides. The first one is about latency. The inference on servers often introduces significant transmission latency, which is unsuitable for real-time response requirements. And the second limitation is about resource efficiency, due to fail to utilize existing data plane resources, missing opportunities for green networking and hardware optimization.

So the idea of INIP is to offload the inference workload to the in-network devices. On the control plane, the control plane acts as the central brain. It stores the original models, distills them into general decision trees, and handles fallback inference. On the data plane, it performs packet parsing, match action table based inference, only when it is the specific destination of an INIP packet.

So there are many four key techniques. The first one is about model distillation. It translates neural networks into decision trees for rule-based data plane installation. The second is about CDN-like scheduling. Users request neural networks from controller, then the hot neural networks are installed to the data plane devices.

The third key techniques is that users send INIP packets with features embedded, and then the targeted network device perform match action based inference. The last is about inference fallback. INIP packets are forwarded to the controller if not matched by the network devices. So actually the whole idea is to transform the neural network models into decision trees that can be deployed on the data plane of switches, which can improve the performance of the inference.

Okay. That's all about the draft. Thank you.

Jefferson Campos Nobre: Thank you, Chin. We have time for some questions. Uh-huh. You? No? I think it's just the you have Hanling. Yeah.

Hanling Wang: Okay, yeah. So let me remove you.

Jefferson Campos Nobre: So the next on the line will be slides-125-nmrg-mcp-applicability-network-management-02. I think the question it's for you, Chin. If you can return, it'll be nice.

Speaker 1: Hi, and thank you for your presentation. And I have a quick question regarding the model distillation. And when we like transferring it into a common decision tree, how do you do with the accuracy loss when doing this distillation, and especially for like you are going from the linear model to a nonlinear model to a linear model like? Um, how do you usually deal with those accuracy loss, and how to like prune those features?

Chin: Okay. Thank you for the great question. I think actually I maybe I can respond to this question according to my own understanding because actually I'm not the author of this specific draft. The author is from the same working group of me. So actually I think the question about it's actually about the latency, the tradeoff between the latency and accuracy.

I think actually by transforming the neural network models into decision trees, the accuracy would at a higher probability then the accuracy will be degraded because the model is becoming more simple. And in this work is actually there are some solutions, for example, by fine-tuning the decision trees or adding some specific techniques to improve the privacy, to the accuracy.

But I think the performance actually would be limited. And the advantage of this approach is that it can be deployed on the data plane because since the data plane actually is equipped with very limited resources. So I think this method is only suitable for some simple or small-scale neural networks. It might not be effective for the like large language models. So I think this is a the core drawback of this method and mainly it is due to the capability, the resource of the data plane. Yeah. Thank you.

Jefferson Campos Nobre: Okay, Jerome.

Jerome Francois: Thank you for your talk. Um, yeah just a quick quick question. I understand that this is not your main main work. But um, I think there are a lot of similarities regarding what has been done in the past regarding in-network computing because at the end, it's what you want to do here. So I was wondering if the focus also could be more how you transform the model to something that you can compute, because we already investigated how to distribute some computation once we know the function we need to do in terms of mathematics and so on. There are already some proposal.

Or if you if you want also to improve this. I'm not sure what is the focus also because or the positioning regarding what has been done before in all this computation to the data plane.

Chin: You mean to improve the accuracy?

Jerome Francois: No, I mean that because you say that you need to schedule where to compute what to compute on data plane and there have been some prior works that already proposed something like that, not specific to inference, just general computation. You give a function that you want to compute on the data plane and they just allocate resources into P4 switches. I saw that you have mentioned that in in your draft. So I don't know how this how you will position your work regarding that.

Chin: I I think this question might be a bit hard for me.

Jerome Francois: Yeah, I understand, I understand.

Chin: I think it would be better to maybe to ask the author. I I will ask him to maybe to contact the chair. Thank you.

Jefferson Campos Nobre: Okay. And a quick question from Sheng.

Sheng: Chan from China Mobile. Quick question here that have you published your work with an individual draft or some paper? I didn't find it in the website, because I if we want to know some details like fixed packet length with 64 bits and some extended packet header, so where can can I find the detail information? Thanks.

Chin: Actually the core idea of this is published in one of the paper in like 2023 in INFOCOM by the author Xie Guorui. I think maybe you can search her name on Google Scholar, and she has published one paper about this.

Sheng: Okay. Do you have plan to submit a draft in NMRG? Or have you already done it?

Chin: Yes, actually this draft has already submitted but he doesn't specify a working group because he doesn't determine which is the most suitable one.

Sheng: Okay, thank you.

Chin: Oh, thank you for your question.

Jefferson Campos Nobre: Okay, thank you Chin. So the next presentation Applicability of A2A to the Network Management from Shuyi. So we are running a little late, so I decrease your time. So please, if you you can make it quicker, it'd be nice.

Shuyi: Okay, thank you. Uh, hi everyone. Uh, today I'm going to present the agent network part. Uh, I will just speed, so here I just list that there are multiple agent interconnection scenario and after that you there must have much interference at different level, thus in resulting a complex network. So um in in order to solve this, the gateway and router should take the responsibilities.

Okay. So here first I will give two conceptions. The first one is agentic gateway. Um, so this is to say in the future the gateway will be an agent itself. Besides doing the normal job of a gateway, it the agentic gateway should also perceiving information and taking corresponding action then with based on the LLM or other AI capabilities, it can do the you know decision planning um and analysis and also action execution. Um, and here I list a use case for the further explanation. But here I just say very quick. That is in the industry factory, the smart manufacturing robots will be managed and controlled by the same gateway and they can, you know, receive the commanders and also giving some real-time operational data then for the further analysis.

So the next one is agentic network actually in the IETF background should be agentic IP network, which means it is a new network architecture that will composed of agents, network resources computing resources and also support the agent the protocols. Uh, okay.

So the next slides will going to be the detailed requirements. So the first one is the forwarding routing anyway it's to help the user find the collaborate agents and also have the forwarding path. The way may be establish a routing forwarding protocol and etc. Next one is a internet network environment perception. That is to say the agent should be actively to handling all the things so the perception ability actually is the trigger. So perceiving the intent and environmental data is to help to do the next step. Um, okay.

Next one is a protocol compatibility and conversion. So this happens when device with different protocols in the same gateway to communicate with each other. Okay. Next is okay, so here is to say the gateway needs to determine the whether the requirement of the inference should be done at the edge or the central. Um, okay, the management is that the agentic gateway as a agent should be modeled and managed by the upper controller.

Okay, I'll just skip it. More information can be found at the my another draft. Also for this part. Uh, so the architecture here currently the architecture of agentic network covers for the six-layer. And actually we need to do further work of the function of each layer and how to implementate them and the API definitions.

Okay. So for next steps, you know the the draft is at the very early stage. So the next step will be doing more research of the requirements of the agentic network and also the agentic gateway, then map the requirements to each layer, so that is the next steps and anyone who is interested in my topic is welcome to give comments and also work together for this. Okay, thank you.

Jefferson Campos Nobre: Thank you Shuyi for the presentation. We have time for maybe one question. A quick one. No questions? So always we can send your questions for using the chat or using our mailing list. So thank you Shuyi. We're going for the next presentation. Use Cases and Practices for Intent-Based Networking.

Mohamed: Hi everyone. Can you see me? And can you hear me? Hi everyone. So my presentation is actually the architecture principles, Agentic AI architecture principles for autonomous computer networks. I present interdigital and these are my co-authors.

Next slide please. Oh, I can do it myself. Oh, thank you. So what is the problem that we have compared to the traditional approach which is statically configured, hard-coded, and centralized? And also currently we have agentic AI networks that are goal-driven, that can reason, that can plan, that can invoke tools, but the problem is that they are vendor lock silos across layers and there is no standard way to name, discover, or coordinate agents.

So for that reason, I think what we need is the argumentation into distributed goal-driven network where we have standardized interdigital inter-agent networking. So agents usually what we have is that they cannot talk to each other currently between different vendors and we need mechanism for agents to discover and coordinate with each other.

So I think building on that, we already know IP TCP IP protocol suit and I think the proposal here is that when we talk about this stack layer concept we have these four layers of five sometimes depending on how do you see it.

So the proposed architecture really tells you that for each layer we need to have an agentic network where each layer has its own set of agents that are that can communicate within the layer and then you have a controller or an orchestrator however you want to call it that can control orchestrate these agent net within the layer and it can also talk across the layers.

So basically whatever we have currently the services interface of deterministic the deterministic abilities that we have at each layer they remain intact. So agents act as first-class entities alongside each layer and we need to have basically explicit separation of packet transport from automation logic for each layer.

And when we talk about per layer, so for example at each layer you need to have a controller that can parse intent that can decompose task and obviously we need to have some guardrails around it. And that controller can assign task to different agents. So for example you can have a congestion agent, QoS enforcement agent, policy enforcement agent, and each agent can invoke tools that already exist at each layer.

And so what do we have currently in terms of challenges? Currently we do not have any standardized mechanism of naming, addressing, schema for these agents cross layers. We have limited discoverability and data provision, so exchange of telemetry and operational data. And also I think security and I think in within the security, we we also need to have some we also need to have some mechanisms of guard-railing and and we need to move towards more deterministic system.

So because agentic is inherently a stochastic so each agent that can act, that can reason and plan, it's all stochastic. So how can we achieve determinism in that? I think we need to have some kind of validations, we need to have some we need to have some kind of guardrails around this and we need to have some auditable traces and observability in this kind of environment. So I think that can help us achieve some sort of determinism and at least we are able to track, observe and validate the actions of agents. And by putting the guardrails around them we will be able to, I think, have a better or more deterministic than what already is.

Mohamed, we run out of time, so if you can wrap it up, it'd be nice.

Mohamed: Okay. So what I want the network management research group is to study and agree on agentic AI and related terminology because a lot of work is going on on this and I think it would be good to study this and whatever the architecture that I have proposed, so if we can refine, I would be happy to contribute and also collaborate. I'll be welcome any collaborations within this group. So basically that's the whole point of bringing this draft is to work together, agree on a terminology and then refine the proposed architecture and build some kind of architecture around agentic AI. So thank you very much, that's all I have.

Jefferson Campos Nobre: Okay, thank you Mohamed. Unfortunately, we don't have time for for questions. So the next presentation from Minze, right?

Minze: Okay everyone, this is Minze from Zhongguancun Lab. I'd like to introduce our updated version of our draft titled A Framework for LLM agents assisted network management with human in the loop.

So let's start from the motivation. Traditional network management faces some challenges like the the modern network is becoming more complex and dynamic and and the vendor heterogeneous. So so the network operators need need to handle with the complex and the dynamic intents and and the learning curve on vendor specific devices, which will introduce a high operational cost.

So the TMF Forum has proposed the long-term vision for autonomous network management, which includes the concepts like the zero weight, zero touch and self-configuration, self-healing, and so on. So I think maybe we have achieved a a rough consensus that the agentic network management might be a a promising way to achieve the autonomous network management.

As we can see that there are lots and lots of agentic network management and and some and other application drafts in IETF and IRTF that includes the discussion of the agents, the communication between agent-to-agent and agent-to-tools and the digital twin and and the intent-based agentic network management.

So as we all know that the agent's behavior is not always reliable and trusted. So so we want to discuss more about how to guarantee the safety and and and the security of the agentic network management system.

So our our key idea is from the NMRG charter which says that human user should still remain in the loop and it will be a long term to progressively replace traditional network management with agentic network management and during this process we need to define the interface between human and the autonomous network management system. So in this this draft, we discussed the framework that build this agentic network management and its workflow and its interface.

So in this this framework, this framework includes three components: like the enhanced telemetry model which is responsible to inject some some field description and some auxiliary information into the the raw telemetry data. And the agent decision model is responsible to specify a task instance and generate a corresponding configuration. Finally, the human operator should audit the generated configuration before issuing this configuration into devices.

So the the key points of this framework is that we should define the human-to-agent to human interface, like the agents generates the configuration or some actionable policies and the corresponding configuration confidence scores and the explainability logs and then the operator performs the audit process and which is recorded.

So regarding the comments we received in the IETF 124, we add some security considerations regarding the different aspects of the autonomous network management system. We also propose some threat vectors like the prompt injection. They attacks may modify the hostname, for example they ignore previous instruction and delete BGP session, which will result the the unauthorized configuration.

And the second threat vector is the agent identity spoofing. Like they some attacks may impersonate a task agent by stealing its agent key. And the third threat vector is the DDoS attack like the agent is a very computation-intensive, so we need to limit the request rate.

So we we gave a in-depth security discussion of the the whole pipeline or workflow. In this process, we can see that we need a human in the loop. And we also implemented a demo, which includes the threat perception and defense policy generation and policy optimization agents and and the configuration should be audited by the operation before issuing this configuration into devices.

Okay, thanks. Our draft is also on the GitHub. Welcome more issues and PRs. Thank you.

Jefferson Campos Nobre: Okay, thank you Minze. Thank you for your comprehension on the time. So if you have questions, please send him in using the chat or our mailing list. So thank you. The next presentation will be performed by Carlos, right? As you can imagine, you are run of time, so I ask you to go as as fast as you can.

Carlos: I'll be very fast and it's also Thursday and this is yet another presentation on AI. So let me go fast.

So basically this is a first exercise trying to identify and define how we can use agentic AI for a use case that is based on sensing. So the idea is that we have one particular application, although the potential architecture that the results of this is will be applicable to others, where we define a simple architecture based on agents for the specific application which is sensing in this case and agents for networking, and we try to identify and develop interaction between those agents in order to overcome a complex task, which is how to develop distributed sensing or how to implement distributed sensing which has requirements on both the communication, the networking and both at the application layer.

So we do this, you can read the draft, I will not go into the details of course, but then one point that I would like to maybe raise or just spend one minute is on the open issues and questions because in this specific use case, we have on the one hand questions related to how we define the the task, in this case is the sensing, how we develop or how we define data governance aspects in the architecture for the data that is going to be used, how we also define specify the allowed agentic interaction level in the sense of how many agents do we want to allow to participate into this operation and also the networking requirements.

And with that, we identify a set of questions for this group in this case, but is in general for IETF, whatever the AI work is going to end up being done, which is agent discovery and registration, agent to agent protocol which are the elements that needs to be standardized. And then for the case of multi-domain sensing, what are the trust privacy mechanisms that we may need to involve for these agentic communications, how to resolve conflicts, and of course and last question, if there is anyone interested in working on this topic in this group or whatever group that ends up taking being the home, please let us know and we will be happy to collaborate with you guys on this. Sorry.

Jefferson Campos Nobre: Okay, thank you, thank you Carlos. Right on time. Okay so this is the last presentation. So thank you for attending our session and also for the presenters and see you in Vienna, right? Bye-bye.