Markdown Version | Session Recording
Session Date/Time: 07 Oct 2025 22:00
NMRG
Summary
This NMRG interim meeting focused on Agentic AI for network management, featuring three presentations and an open discussion. Key topics included the definition and characteristics of Agentic AI, its potential and challenges in network management, a conceptual distinction between network automation and autonomy, and the applicability of the Model Context Protocol (MCP) for network management. Attendees engaged in discussions about security implications, deployment models, and the adaptation of large language models for network configuration tasks.
Key Discussion Points
1. Agentic AI for Network Management (Jungen)
- Definition & Characteristics: Jungen defined Agentic AI as a class of AI focusing on autonomous systems capable of decision-making and task execution with limited human intervention. Key characteristics include autonomy, goal-oriented behavior, adaptability/learning, LLM integration, and interaction with external tools. Autonomy is highlighted as crucial for network management.
- Paradigm Shift: Agentic AI represents a shift from traditional automation (relying on pre-programmed rules) to autonomous decision-making for large-scale network activities and customer requests.
- Limitations of Existing Techniques:
- Intent-Based Management (IBM): Relies on predefined data models (e.g., YANG) and requires human oversight, lacking adaptability and flexibility for unforeseen conditions. Agentic AI aims to minimize or eliminate human intervention.
- Autonomic Service Agents (ASA): Typically designed for specific, localized functions with predefined policy structures, lacking complex reasoning or self-reflection capabilities characteristic of Agentic AI.
- Architectural Bottlenecks: Existing centralized AI systems struggle with the volume, velocity, and distributed nature of Agentic AI workflows, leading to high latency and single points of failure. Distributed mesh architectures are suggested.
- Absence of Agent-to-Agent Semantic Interoperability: Different vendors and frameworks lead to fragmentation. Standardization is needed for consistent payloads and interfaces to enable agents to discover, understand, and collaborate.
- Lack of Dynamic Trust and Accountability: Autonomous agents performing actions at machine speed raise security and governance challenges, requiring dynamic access control and accountability mechanisms.
- Real-time Validity and Resilience: Agent decisions depend on data quality. Incomplete, delayed, or corrupted data can lead to operational or financial losses.
- Proposed Purposes:
- Hyper-autonomous network operation management (self-driving network).
- Intelligent and dynamic resource orchestration (6G, holographic comms, large-scale IoT).
- Predictive and adaptive network security (detecting zero-day attacks, reconfiguring defenses).
- Enabling mobile network service models (network as a distributed AI platform).
- Implementation Example: A very early-stage "robot agentic AI of edge network" project was shared, involving camera, Agentic AI (Docker containers with LLM), and robot entities. This demo primarily uses AI for network workload at the edge, with future plans to integrate network management aspects.
- Discussion:
- AI Agent vs. Agentic AI: Clarification that the focus is on "Agentic AI."
- Relevance of Robot Example: Acknowledged as early stage, with future intent to incorporate network management aspects.
- Security Concerns: Recognized as critical due to Agentic AI's action capability. It was noted that this will require significant research and development in monitoring and control of Agentic AI workflows, similar to past efforts for other technologies.
- Scope of Application: A suggestion was made to focus on the new research challenges introduced by Agentic AI rather than just applying it to existing problem domains.
2. Automation vs. Autonomy (Chris)
- Starting Point: Referencing
draft-ietf-nmop-operator-intent-and-its-lifecycle, Chris outlined foundational automation capabilities (intent-driven closed loops, NDTs, analytics, algorithms). - Distinction:
- Automation: An objective-driven process that follows pre-designed rules (e.g., a thermostat).
- Autonomy: Adds the ability to adapt the automation process to new or unanticipated circumstances, driven by evolving knowledge and reasoning/planning processes.
- Key Functions for Autonomy:
Test planning,reflection, andself-optimizationare identified as planning/reasoning functions for designing or redesigning an automated operations process. This is distinct from the execution of the process. - When are these invoked?
- Initial stand-up phase of an automation use case (though less common in network management to design from scratch).
- More importantly, adapting an existing automation process when "new knowledge" indicates the current process is inadequate (e.g., changes in operating environment, system composition, performance, or qualitative shifts in intent).
- Two Closed Loops (ETSI ZSM concept):
- Reactive Closed Loop: The ongoing, operating automation (e.g., optimizing optical channel powers based on physics). The physics don't change, so the process doesn't need constant redesign.
- Proactive Closed Loop: Looks for and digests "useful new knowledge" (knowledge that would drive a redesign), then invokes reasoning/planning capabilities to modify the reactive closed loop. The reactive loop can also contribute to this knowledge.
- Sources of Useful New Knowledge:
- Significant changes in the network system's operating environment.
- Changes in system composition or functional details (e.g., new devices).
- Inadequate performance of the existing closed loop.
- Qualitative changes in intents (outside the scope of the current automation design).
- Knowledge from experience beyond the direct system (e.g., LLM's vast training data).
- Manual input (e.g., feeding new spec sheets).
- Human/LLM Interaction: The presentation emphasized flexibility, noting that LLMs can support manual/semi-manual processes, enable scrutiny of proposed redesigns, or even stimulate proactive loop actions based on human observation.
- Discussion:
- Intent/Analysis/Decision/Execution: While part of automation, for autonomy, they are enhanced by self-awareness and choice-making capabilities that can redesign the underlying automation. Autonomy is considered a superset of automation, with the proactive closed loop being the key incremental capability.
- MCP's Role: The role of MCP in tool choice was deferred for further discussion and the next presentation.
3. Applicability of MCP for Network Management (Shinu)
- MCP Overview: The Model Context Protocol (MCP) is an open protocol, developed by Anthropic, that decouples large language model (LLM) applications from tools. It's gaining popularity for various automation tasks.
- Value for Network Operators:
- Network Intelligence: Makes network architecture agile by abstracting network functions into network agents and tools, enabling proactive closed-loop management.
- Network Exposure: Allows network controllers to expose rich network capabilities as an MCP server to third-party AI agents/applications, and allows network AI agents to consume external capabilities.
- MCP for Network Exposure:
- Case 1 (IETF network exposure consumer): Network controller exposes capabilities (topology, path computation) as an MCP server. It also acts as a tools registry for external APIs.
- Case 2 (IETF network agent consuming external): Network AI agent (MCP client) requests tool lists from the network controller (MCP client) which discovers them from external data sources.
- MCP Server Discovery: Important for scenarios with multiple MCP servers providing different capabilities (network tools, resources, prompts).
- Deployment Cases:
- Case A: MCP client + LLM + MCP server all on the network controller. Reuses legacy management protocols (NetConf, telemetry).
- Case B: MCP client + LLM on OSS-BSS layer (user side), MCP server on network controller. Reuses legacy management protocols.
- Case C: MCP client + LLM on network controller, MCP server on each network device. This requires devices to support MCP, replacing legacy protocols, leading to radical changes but potentially better real-time performance without translation overhead.
- Comparison of Cases: Evaluated on LLM invocation, intent recognition, task planning, action execution, network impact, performance, and resource consumption. Case C has high resource consumption for devices.
- MCP Architecture:
- User provides natural language request to MCP client.
- MCP client (with LLM) understands intent, formats tool calls.
- MCP client interacts with MCP server to execute tools.
- MCP server returns results; LLM processes results into natural language response.
- Workflow includes encapsulating device operations as MCP tools, LLM for intent-to-tools translation, and closed-loop automation execution.
- Usage Examples:
- Configure Service Interface: User requests L3VPN site modification; MCP client (with LLM) interprets intent, generates tool chain sequence, sends to MCP server for execution of VPN service modification workflow.
- Device Interface Configuration: User issues natural language command (e.g., "display interface brief"); MCP client (with LLM) translates to tool call; MCP server executes (integrating vendor-specific CLIs); LLM processes raw data to natural language response. This removes the need for operators to learn vendor-specific CLIs.
- Next Steps: Continue investigating MCP integration with open-source tools (InfoHub), YANG data models, GMI, and for energy-saving conversions.
- Discussion:
- Case A Value: Clarified as exposing network capabilities and consuming external tools through the controller.
- Replacing Netconf in Case C: The intent is not to replace existing protocols like Netconf or telemetry, but to integrate and refactor existing device operations and data models (like YANG) as MCP tools, especially for aspects not covered by YANG. Using gateway devices to support MCP for downstream devices is an experimental option.
- MCP Agent vs. Server: Agents are outside the MCP server, acting as clients.
- Security/Authorization: Acknowledged as an open issue, with ongoing research in the MCP community regarding OAuth 2.1 and OIDC for federated access and authentication across multiple MCP servers.
4. Exploring Language Technologies for Autonomous Network Configuration (Angela)
- Core Idea: Treat network management commands as a specialized language that language models (e.g., BERT) can interpret to understand an administrator's intent.
- Context: Reasoning LLMs (like BERT) are good at understanding language in context.
- Problem: Can LLMs autonomously interpret the intent behind network commands for configuration?
- Related Work: Mentioned several academic papers using LLMs (including GPT-4) for P4 device configuration, synthesizing configurations, and general network configuration management. Highlighted that some research shows GPT-4 performing poorly without specific fine-tuning for this context.
- IETF/IRTF Context: Referenced a previous site meeting on large models for networking and the
draft-ietf-nmop-assisted-nm-framework. - Open Questions:
- How to build an NLP pipeline where high-level policies/intents (natural language) are input and configuration commands are the output.
- Alternatively, how to modify an NLP pipeline to directly process network configuration commands as a "special type of language," rather than natural language input/output.
- What's the most useful approach for NMRG: creating custom LLMs or fine-tuning existing ones?
- Discussion:
- LLM Validation: Questioned how the referenced papers validate their LLMs for network configuration, specifically regarding benchmarks and dataset scale. Angela noted they created their own specific benchmarks, and a key challenge is acquiring an adequate corpus of network comments/configurations.
- Adapting BERT: BERT (or similar foundational models) cannot be used directly but would require adaptation and fine-tuning for the specific domain of network management. This adaptation itself could be a valuable research contribution.
Decisions and Action Items
- Jungen:
- Update
draft-irtf-nmrg-ai-deployto include Agentic AI concepts and distributed deployment methods. - Submit a new internet draft with the title "Motivation and Problem Statement for Agentic AI in Network Management." (Call for collaboration on this draft).
- Update
- Shinu: Take note of the chat questions and comments regarding MCP and respond on the mailing list.
- Angela: Continue investigating the adaptation of LLMs for network configuration, considering the need for domain-specific corpora and benchmarks.
- All Presenters & Participants: Continue discussions on the NMRG mailing list for complex technical topics and open questions (e.g., security of Agentic AI, automation vs. autonomy distinctions, MCP integration challenges).
Next Steps
- Further research and discussion within NMRG on Agentic AI and its implications for network management, including architectural models, security, and interoperability.
- Exploration of the Model Context Protocol (MCP) as a means for network exposure, agent-to-agent communication, and interface configuration.
- Investigation into adapting and fine-tuning large language models (LLMs) for understanding and generating network configuration, including the development of relevant datasets and benchmarks.
- The co-chairs will publish the meeting notes on the IRTF data tracker.
- The community looks forward to continuing these discussions at future IETF/IRTF meetings (in-person in Montreal or online).