AIPREF

Summary

The AIPREF Working Group held its interim meeting to progress discussions on draft-ietf-aipref-vocab. The session focused on resolving key issues surrounding the legal context, conformance, and applicability of machine-readable preferences. The group also evaluated and established starting points for the definitions of "AI training" and "search" preferences, and debated the potential introduction of a broader "use" or "input" category.

Meeting logistics and agendas were presented from the Webex Link slide deck.

Key Discussion Points

1. Relation to Existing Laws and Terms of Use (Issue 1902 / Issue 160)

Max presented a proposal stating that preference expressions are not intended to override website Terms of Use (ToU), and that in the event of a conflict, ToU should prevail as a matter of law.
Paul and Nate raised concerns that unilaterally declared ToU are distinct from negotiated bilateral agreements. They argued that explicitly stating that ToU "shall prevail as a matter of law" is problematic across varying jurisdictions and runs counter to the group's purpose of establishing machine-readable signals.
Mike emphasized that ToU generally govern human and logged-in user behavior, whereas machine-readable preferences are aimed at automated crawlers accessing public, non-authenticated content.
Meredith suggested a compromise: state that preferences do not override ToU, but stop short of declaring legal precedence, thereby treating human-use terms and technical-interaction preferences as separate domains.
Glenn noted that the IETF lacks the mandate to order legal instruments and contract formations.
Suresh and Paul observed that the draft's original language ("without prejudice to applicable laws") is a neutral, light-touch approach that avoids altering the legal status quo.

2. Applicability and Effect: Illustrative Lists vs. Exclusions (Issue 160 / Section 3.2)

The group debated a proposal by Alyssa to pair back Section 3.2, replacing detailed forward-looking permitted reasons to ignore preferences with a high-level, historical summary of safety, accessibility, and public interest practices.
Lila supported retaining an illustrative list of rationales (e.g., academic research, accessibility) to give risk-averse institutions (like libraries and universities) confidence that standard public-interest exceptions remain viable.
Tim (Timid Robot) proposed formalizing these examples into explicit "exclusions" in a new Section 3.2 to avoid having to protect these public-interest use cases individually inside every vocabulary category.
Glenn strongly opposed this direction, stating that carving out licensing-like exceptions or setting policy on internet access exceeds the working group's charter.
Suresh noted that Section 3.2 was originally introduced to clarify that preferences are not technical access controls. He indicated that a more minimal approach, akin to Alyssa's proposal, seemed to have broader support.

3. Conformance and Technical Limitations (Issue 164 / Section 3.1)

Suresh introduced Pull Request 1905, which replaces Section 3.1 ("Conformance") with a more precise description of the specification's technical boundaries.
The proposed text clarifies that the document's primary purpose is ensuring that expressed preferences are understood by recipients, rather than mandating compliance. It explicitly outlines technical limitations, including the lack of built-in origin authentication for certain attachment mechanisms (like robots.txt).

4. Distinguishing Access and Use (Issue 167 / Issue 150)

The group confirmed the conceptual separation between content acquisition (governed by robots.txt and access controls) and downstream use of the asset.
Ronnie supported this distinction, noting that the draft should focus solely on downstream usage preferences without dictating what governs the initial access. This conceptual alignment successfully resolves both Issue 167 and Issue 150.

5. AI "Use" / RAG Category (Issue 172 / Issue 150)

Brad and Nick highlighted the critical necessity of a preference targeting Retrieval-Augmented Generation (RAG) and downstream AI usage. They cited data indicating that RAG-driven crawling is replacing traditional search traffic, significantly impacting referral rates for small publishers and risking brand damage through hallucinated or misattributed outputs (e.g., "Frankenstein recipes" or incorrect itineraries).
Paul noted that external initiatives like Rights Statement Language (RSL) and Content Signals contain "AI input" categories that are designed to be compatible with AIPREF.
Meredith raised a structural concern, suggesting that attempting to control downstream information use post-publication conflicts with the core open-web publishing bargain.
Kevin suggested exploring more granular display-based preferences, allowing publishers to opt into generative features only if proper attribution and links are guaranteed.

6. Defining "AI Training" (Issue 1908)

The group evaluated two competing definitions for the training preference: Option 1 (narrowly tailored to models with generative capabilities or design purposes) and Option 2 (a broader definition covering general machine learning/AI models).
Tim and others expressed concern that Option 1's focus on "design purpose" allows a "bait-and-switch" scenario where a model trained for non-generative purposes is later used generatively.
A poll of the room was taken, indicating a strong preference for Option 2 (broader definition).

7. Defining "Search" (PR 201 / PR 1909)

The group compared PR 201 (paragraph style) and PR 1909 (bulleted list).
Warren expressed concern that explicitly banning "generative" capabilities in search might inadvertently prohibit non-substantive AI utilities, such as intelligent image cropping/resizing or semantic search term matching.
A poll of the room indicated a preference to use PR 201 as the baseline.
A subsequent poll was taken on whether to keep or remove the word "verbatim" in relation to snippets. A strong majority favored removing "verbatim" (75% to 25%) to allow placeholders for non-substantive accessibility transformations (e.g., translation and text-to-speech).
Nate and Tom cautioned that "non-substantive transformation" requires precise definition to avoid conflicting with other search exclusions.

Decisions and Action Items

Decisions (Subject to Mailing List Confirmation)

Issue 167 (Distinguishing Access and Use): Closed, with the understanding that access and downstream use are strictly separated.
AI Training Preference: Option 2 (broader AI model definition) was selected as the starting baseline. The vocabulary term will be updated to "AI training".
Search Preference: PR 201 was selected as the starting baseline. The word "verbatim" will be removed, and a placeholder for accessibility/translation transformations will be incorporated.

Action Items

Editors: Create and update Pull Requests for the "AI training" (Option 2) and "search" (PR 201 with accessibility edits) definitions to reflect the room's preferences.
Suresh: Submit PR 1905 (Section 3.1 Conformance) to the mailing list for a formal consensus call.
Nick, Brad, Kevin, and Suresh: Collaborate on drafting a concrete proposal for a "use/input" (RAG) category prior to the next interim.
Krishna: Open a GitHub issue to analyze how the new search preference interacts with legacy crawler directives (e.g., no-snippet).

Next Steps

The working group will conduct asynchronous consensus calls on the newly established baselines for the training and search terms.
Interim Schedule Plan:
- June 2026: Virtual interim meeting focusing on "use" category proposals and restarting discussions on the attachment specification.
- July 2026 (IETF 126 - Vienna): Request two 2-hour slots to socialize proposals with the wider IETF community.
- Late August / Early September 2026: Proposed hybrid interim meeting in Europe (target: London).

Automatic IETF Minutes