RASPRG

Summary

The RASPRG session featured a panel discussion on "How does standardization look in an LLM-enabled future?". Four panelists presented their perspectives on the role of Large Language Models (LLMs) in internet standardization, ranging from using LLMs as tools for understanding and processing existing specifications to envisioning a future of "prompt-run protocols" and the potential for the IETF to define a "technical language" for LLM interaction. The subsequent discussion explored the practical benefits and challenges of LLM adoption, concerns about hallucinations, the need for human validation, and the long-term implications for the IETF and the broader engineering community.

Key Discussion Points

Opening and RASPRG Goals: Alvaro Retana welcomed attendees and reviewed the IRTF Note Well and Code of Conduct. He emphasized that RASPRG focuses on long-term research related to the Internet, aiming to understand standardization processes through evidence-based work, not to produce standards or directly influence IETF policy, though insights may be shared with IETF leadership.
Panel Introduction: Ignacio Castro introduced the panel on "How does standardization look in an LLM-enabled future?", comprising Gia (University of Oslo), Jaime Jimenez (Ericsson), Joseph Potvin (Xalgorithm Foundation), and Jiankang (I.O. NGO).
Gia - Automatic Mining of Mailing Lists for Internet Protocol Design Decisions:
- Presented research on using LLMs to understand the rationales behind design decisions, which are often not explicitly documented in RFCs but found in email archives and GitHub.
- Aims to recover missing design decisions from emails using LLMs and create semi-synthetic datasets of decision-rationale pairs.
- Leverages information retrieval techniques, converting text into high-dimensional vector spaces for relevancy ranking.
- Proposed LLMs as an "auto-editor" to restore missing decisions and generate formal explanations, much like a retrieval-augmented generation (RAG) pipeline.
Jaime Jimenez - Do Not Delegate Understanding to LLMs for Standards:
- Stressed that LLMs are powerful tools for delegating tasks, but not understanding; human validation is essential.
- Highlighted the IETF's rich, machine-readable records (transcripts, RFCs) as excellent data sources.
- Noted a convergence of developer practices (GitHub, chatbots, CI/CD) and standardization processes. Envisioned CI/CD pipelines checking RFC consistency, similar to code quality checks.
- Demonstrated practical uses: generating reports from transcripts (e.g., IETFminutes.org), editing/reviewing documents, consistency checking, and automating mechanical edits using tools like Claude.
- Identified challenges: LLM hallucinations and compound errors in agents requiring human-in-the-loop, lack of established best practices for standardization-specific LLM use, need for infrastructure, and resistance to adoption.
Joseph Potvin - Transforming IETF Standards into Rules-as-Data:
- Advocated for transforming precise but prose-heavy RFCs (e.g., BGP4 RFC 4271) into structured "rules-as-data" using a finite grammar.
- Presented "Rulemaker" (built in Spell.js) and "Rule Reserve" (a network service) to create a library of consistent data structures for dynamic discovery, conformance checking, and unit testing.
- Uses LLMs as "linguistic aids" to help draft sentences into this constrained grammar, not for interpretation.
- Output assertions are expressed in a six-element finite grammar, with logic read vertically.
- The goal is to provide a machine-readable summary (e.g., 240-character) with direct links back to the original RFC for deep dives, enabling developers to be prompted about relevant rules.
Jiankang - Prompt-Run Protocols:
- Introduced the concept of "Prompt-Run Protocols," envisioning a future where LLMs or SLMs (Small Language Models) could achieve interoperability by responding to prompts, potentially leading to a "post-consensus era."
- Compared this to the evolution of coding from specific languages to natural language prompting.
- Cited examples of LLMs mimicking web servers and widespread discussions about AI's impact in IETF side meetings.
- Proposed that this trend extends beyond protocols to "prompt-run applications" and even "prompt-run organizations" (e.g., bylaws as prompts).
- Suggested the IETF could become a "Royal Academy of the Technical Language" to structure prompts for intelligible and reproducible outputs across the technical community.
- Posed questions: Is this the end of protocols? What role for the IETF? How to prepare to avoid irrelevance? Should IETF define LLM-friendly RFC structures or evaluate LLM behavior for IETF content?
Panel Discussion and Q&A:
- Barriers to Adoption and Future Vision:
  - Jaime highlighted automation and efficiency as key drivers for LLM adoption in development, suggesting similar trends for standardization. He saw lack of best practices and LLM non-determinism as current roadblocks.
  - Joseph emphasized LLMs as functional tools that enable humans, advocating for "artificial naiveté" in their use.
  - Jiankang argued that incentives for automation are strong, and LLMs, like young engineers, will improve with training. He believes a "new paradigm" will eventually replace current tooling.
- Concerns about Hallucinations and Accuracy:
  - Joseph and Jaime discussed "entrenched hallucinations" – LLMs inserting extraneous or incorrect information, which is difficult for experts to spot and might be a fundamental problem.
  - Andrew Kampling expressed skepticism about calling LLMs "AI," seeing them as statistical pattern matching. He warned that early over-reliance could "hollow out" human skills and stifle innovation.
  - Jiankang countered that "satisfactory enough" output could drive adoption, and expectations for engineers (and LLMs) evolve.
  - Rob Van Meter observed that LLMs struggle with "harder questions" and finding good information, questioning if this is a fundamental limitation.
- Structuring RFCs for LLM-Friendliness:
  - Colin asked about formal methods to help LLMs model protocols. Jaime suggested RDF, ABNF, and other formal languages. Joseph noted the potential of "pattern languages" for RFCs to structure both human thought and LLM processing.
  - Jiankang reiterated the "Royal Academy" idea for defining stable, agreed-upon technical terms and keywords to ensure coherent LLM outputs. He suggested IETF could evaluate LLM performance for its content.
  - Gia proposed using LLMs as reviewers to check RFC clarity and identify misunderstandings.
  - Sue Hares emphasized the need for "failure discussions" in LLM design, drawing parallels to past automation failures in networking (e.g., Yang models) and security concerns. She advocated for template approaches but stressed the need for human "thinking outside the box" for crisis points and manageability. Joseph responded that his work provides running code for rule extraction and aims to prompt developers about conformance without automating code injection.
- Impact on the Future of Standardization:
  - Yariak asked about factors influencing the future and where this leads (less programmers, faster production, more tailored protocols, or giving up standards).
  - Jaime suggested faster cycles, potential code-first spec generation, and smaller groups achieving critical mass.
  - Jiankang argued that deterministic expectations from code might not apply to LLMs, which are more akin to non-deterministic humans making "mistakes" (hallucinations). He envisioned engineers as "wizards" of prompting, maintaining a complex infrastructure.
  - Joseph warned of built-in biases in training data, potential for sophisticated abuse, and LLMs' lack of access to older, brilliant data structuring methods.
- Policies and Attribution:
  - Nicola Sign inquired about policies on LLM usage for authorship or attribution. Jaime mentioned internal company policies and the IRTF Code of Conduct might address AI authorship (disclosure if an LLM "wrote" a draft).
- Near-Term Focus:
  - Markovalser and Kunle emphasized focusing on "low-hanging fruit" – using LLMs in the near term to write clearer, shorter, more precise RFCs, and to identify inconsistencies, rather than worrying excessively about distant futuristic scenarios.
  - Gia added that fine-tuning LLMs should involve providing reasoning or "thought processes" to build trust and learn how to reason, not just input-output pairs.

Decisions and Action Items

No formal decisions were made during the session.
Action Item (Informal Offer): Sue Hares expressed interest in collaborating on LLM experimentation for BGP-related RFCs, offering a set of 20+ RFCs that need to follow a template, and Joseph Potvin welcomed the collaboration.

Next Steps

Continue discussions on the integration and implications of LLMs in IETF processes.
Explore practical applications of LLMs for improving RFC clarity, consistency, and review.
Consider research into structuring RFCs and defining pattern languages to make them more amenable to LLM processing.
Potentially investigate the creation of fine-tuned LLMs specifically trained on IETF documents with expert feedback.