**Session Date/Time:** 30 Sep 2025 10:30 # [AIPREF](../wg/aipref.html) ## Summary The AIPREF working group session focused on two main areas: a detailed discussion of Krishna's proposed display-based vocabulary for content preferences, and the ongoing debate regarding the inclusion or exclusion of a top-level "automated processing" (or TDM) category in the vocabulary. While Krishna's proposal garnered interest for its fine-grained approach to controlling content use in search and AI experiences, concerns were raised about its scope, applicability to diverse media types, and compatibility with the existing charter. A strong sense of the room indicated a preference for exploring more fine-grained controls, while opinions remained divided on the necessity of a broad top-level category. The discussion highlighted the tension between comprehensive, hierarchical definitions and more specific, use-case-driven approaches, especially given the rapid evolution of AI technologies. ## Key Discussion Points * **Krishna's Vocabulary Draft Proposal:** * **Objective:** To offer content creators more transparent and accountable options for dictating how their content is used in various experiences, including search and AI. * **Proposed Terms:** * `display text`: Controls whether identified text and associated reasoning can be displayed or processed for search/AI. Setting to 'no' prohibits both. * `display length`: Specifies the maximum amount (e.g., words, characters) of content to be used for experiences like AI summarization. * `exact text match`: If 'yes', allows summarization but requires exact reproduction of content, limited by `display length`, and suggests providing a backlink to the source. * `generative AI category`: A specific term related to generative AI. * **Core Concept:** Focuses on how content is used at the time of display or for processes related to display. * **Scope and Applicability:** * A working group member (Leonard) initially criticized the proposal as too narrow, seeming limited to web-based, text-based content and not accounting for other media types (images, video) or offline scenarios. * Krishna clarified that the proposal's text does not exclusively limit its scope to the web and aims to cover "search, AI, summaries, etc.", noting that categories for images and videos are also part of the proposal. * The chair noted the proposal's parallels to Creative Commons in terms of controlling use, modification, and attribution. * **Distinguishing Search Experiences:** * Discussion arose on how to differentiate between "traditional" search snippets and AI-generated summaries or syntheses of content. A desire was expressed to allow organic search links/snippets while prohibiting AI-driven summaries. * Krishna explained that setting `display text` to 'no' would still allow the URL and title to appear but prevent the document's content from being processed for AI summaries. Granular control is also offered via `display length` and `exact text match`. * Concern was raised about potential perverse incentives, where content creators might use images for primary information while filling readable text with spam to manipulate controls. Krishna noted that spam processing is a separate concern. * **Nofollow and Processing for Non-Indexing:** * A question was raised regarding the interaction with `nofollow` and expressing preferences for parsing a page for reasons *other than* indexing. * Krishna clarified that `nofollow` pertains to indexing/crawling, while his proposal primarily dictates the *display* of content. Processing for spam or junk is not curtailed by these display preferences. * **AI Training and Inference:** * The distinction between classical search relying on transformer models and generative AI was discussed. A question was posed on whether a search provider could train a model solely for classical search snippets if the `generative AI training` preference was blocked. * Krishna's proposal includes a specific `gen AI training` category to block training of generative AI models. `display text` set to 'no' implicitly prevents snippet generation. * Complexity of pre-training vs. post-training distinction for non-experts led Krishna to favor simpler, more direct categories like `gen AI training`. * **Desirable Statements vs. Vocabulary Structure:** * A working group member (Martin) suggested shifting the focus from "how to express X with this vocabulary" to first listing "what publishers want to express" (e.g., "small snippet OK, full summary NOT OK") and then designing the vocabulary to meet those needs. * This raised questions about whether controls should apply specifically to AI or broader search engine functionalities, acknowledging that AI has intensified pre-existing issues of disintermediation. * **Attachment Mechanisms and `3.2` Section:** * The document deliberately separates vocabulary from attachment mechanisms (e.g., HTTP headers, HTML tags) to allow initial focus on the terms themselves. * The controversial Section 3.2, which discusses circumstances under which preferences might be overridden (e.g., public safety, user-initiated requests), was debated. Some argued it's a necessary recognition of real-world complexities and limitations of preferences, while others felt it undermined the clarity and enforceability of the preferences, especially if the top-level categories were overly broad. * There was a suggestion to move implementation-related material, including Section 3.2, to attachment-specific documents to keep the vocabulary more flexible and focused. * **Top-Level Category (TDM/Automated Processing):** * Krishna proposed removing the existing top-level `TDM` (Text and Data Mining) or `Automated Processing` category, arguing it is "overly broad" and blocks too much processing. * Arguments for retention highlighted its alignment with existing legal frameworks (e.g., EU TDM opt-out) and its role in a hierarchical structure to cover unforeseen future uses ("default deny" for unknowns). * Arguments for removal emphasized that broad, legally-defined terms are not suitable for machine-readable signals, lead to misinterpretation, and may encourage over-compliance or ignoring of preferences due to ambiguity. * The challenge of designing machine-readable signals for complex legal contexts was a central theme. * **Hierarchy vs. Flat Structure:** * The group debated the merits of a hierarchical vocabulary (which aims to cover the entire space, including unknown future uses, by defaulting to an opt-out for higher-level categories) versus a flat, fine-grained structure (which defines specific, known use cases one-by-one). * Some preferred the flat approach for its clarity and flexibility in a rapidly evolving technological landscape, while others valued the hierarchy for its ability to provide broad protection against unforeseen uses. ## Decisions and Action Items * No formal decisions were reached on specific proposals or vocabulary terms during this session. * The discussions aimed to gather the "sense of the room" on various approaches. ## Next Steps * **Sense of the Room on Search Controls (Poll 1 & 2):** * A poll was taken on whether to limit "search" to a simple carve-out (conservative approach): 11 Yes, 16 No, 8 No Opinion. This indicates a sense against a purely conservative, carve-out approach. * A second poll asked whether to define more fine-grained search and other controls (expansive approach): 16 Yes, 4 No, 8 No Opinion. This indicates a strong sense in favor of pursuing more fine-grained controls. * Krishna is encouraged to further develop his display-based ideas, considering how to componentize them and minimize overlaps. Other members are also encouraged to propose alternative approaches that could find common ground. * **Sense of the Room on Top-Level Category (Poll 3 & 4):** * A poll on whether the group "can live with shipping a vocabulary that *includes* a top-level category": 18 Yes, 11 No, 3 No Opinion. * A poll on whether the group "can live with shipping a vocabulary that *excludes* a top-level category": 20 Yes, 6 No, 7 No Opinion. * The polls suggest a slight preference among those present for being able to live *without* a top-level category, but significant divergence of opinion remains. This will be a key discussion point for the next session. * **Ongoing Discussion:** * Discussion on the appropriate level of granularity, the balance between broad and specific preferences, and the role of the `3.2` exceptions section will continue. * Working group members are encouraged to consider the points raised regarding media types beyond text (e.g., images, video, audio) and non-display-based use cases (e.g., RAG index creation, code generation) for future iterations of proposals. * The next session will continue the debate, potentially starting with the assumption of excluding the top-level category and addressing objections. --- **Session Date/Time:** 30 Sep 2025 07:15 # [AIPREF](../wg/aipref.html) ## Summary The AIPREF Working Group meeting focused on assessing the current state of the group's work following the failure of the Working Group Last Call. Discussions centered on whether the charter remains fit for purpose, the urgency of delivering a solution, and strategies for staging deliverables. A series of informal polls were conducted to gauge the sense of the room on these strategic questions. Key technical discussions revolved around the definition of "AI training" versus "AI use" or "purpose-based" preferences, and the role of the IETF vocabulary in a landscape of emerging de facto standards. The European Commission also offered its perspective on the value of a common vocabulary. ## Key Discussion Points * **Co-chair's Statement on Affiliation:** Mark Nottingham (co-chair) clarified his role in the working group is independent of his employer, citing his long-standing personal interest and IETF norms for wearing different hats. * **Notewell and Logistics:** The chairs reviewed the IETF Notewell, anti-harassment procedures, IP policy, and privacy policy. Meeting logistics, including remote participation via MeetEcho and speaking queue procedures, were also covered. A broken link to meeting arrangements was noted. * **Working Group Status Overview:** * The co-chairs reported that the Working Group Last Call (WGLC) failed, indicating strong disagreements despite earlier perceived convergence. * An initial impression of possible consensus on "AI training preference" was noted, but other areas like "Search," "AI in use/inference preference," and the "umbrella preference" show little agreement and require substantial discussion. * The original goal was to provide building blocks for expressing preferences on content collection and processing for AI development, deployment, and use, aiming to improve upon the current lack of standardization. * The chairs posed three key questions for discussion: 1. Is the charter still fit for purpose and achievable? Do changes need to be made, and can agreement be reached to take them to the IESG? 2. Is there still a sense of urgency to deliver something soon (e.g., by end of year)? 3. Should deliverables be staged (e.g., minimal core first, then additions)? * **Informal Poll Results and Discussions:** * **Poll 1: Charter Goals Achievability:** A poll indicated that a majority of those present (24 Yes, 3 No, 11 No Opinion) believe it is possible to achieve consensus and meet charter goals in some form. * **Poll 2: Charter Scope Appropriateness:** Opinions were split (13 Yes, 15 No, 5 No Opinion) on whether the charter scope is appropriate or needs change. * **Disagreement on "AI" definition:** A participant (Pedro Ortiz Suarez, Common Crawl Foundation) argued that focusing too much on the "AI" term creates problems due to its vague definition in research (preferring "machine learning," "deep learning," etc.) and suggested focusing on "uses" of technologies rather than the term "AI" itself. * **Scope Narrowing vs. Broadening:** Concerns were raised that making the charter broader could hinder consensus, while narrowing deliverables might be achievable without a formal recharter. The rechartering process is lengthy. * **Content Owner Perspective:** A participant (Paul Keller, Open Future) emphasized that the charter's intent was to provide content owners with a way to express preferences, and these preferences may not always align with deep technical understanding but are still valid expressions. * **Precision vs. Ambiguity:** A participant (Eric W. Smith) highlighted the risk of ambiguous text becoming mandatory, leading to "sharp tools" for people who may not fully understand the technical implications. * **Poll 3: Sense of Urgency:** A poll indicated that a majority (12 Yes, 17 No, 4 No Opinion) do not feel an urgency to deliver something by the end of the year. The chairs noted recent milestone pushes (August 2025 to 2026) to allow for more inclusive participation and high-bandwidth discussions. * **Polls 4 & 5: Individual Commitment:** Participants showed strong commitment to continue working for six months (30 Yes, 3 No, 3 No Opinion) and one year (26 Yes, 6 No, 4 No Opinion). * **Poll 6: Staging Deliverables:** The room was divided (26 Yes, 12 No, 1 No Opinion) on whether to focus on a minimal training preference first and then other preferences. * **Arguments against staging:** Some participants (Fred Silva, Advance; Max Camilleri, News Corp; Paul Keller, Open Future; Chris Needham, BBC) argued that definitions are interconnected, staging could impact defaults, and deferring some preferences might lead to their eventual abandonment or loss of participant engagement. * **Arguments for staging:** Others saw it as a tactical way to achieve initial progress and demonstrate the group's ability to deliver, providing a "checkpoint" rather than a stopping point. * **Vocabulary Role:** Paul Keller (Open Future) presented on the purpose of the vocabulary as a common set of well-defined terms for various attachment mechanisms (e.g., IPTC+, CP2A, RSL, Cloudflare Content Signals). He expressed concern that failure to produce a IETF-standardized vocabulary could lead to de facto standards emerging elsewhere, noting examples of existing efforts. He also clarified that the EU Code of Conduct on GPA models refers to *attachment mechanisms* (like robots.txt) rather than the vocabulary itself. * **European Commission Perspective:** Stefano Gentile (European Commission, Copyright Unit) thanked the group and highlighted the EU's regulatory framework for AI. He stressed the importance of a common vocabulary with clear semantics to ensure effective expression and detection of preferences from rightsholders, supporting European policies irrespective of legal effects. * **Discussion on Open Issues (Post-Break):** * The chairs proposed reviewing the 27 open issues, starting with broader "overall model" issues before individual preferences. * **Issue 159/170 (AI Training vs. Use/Purpose):** Eric W. Smith argued that objections to "AI training" are often rooted in concerns about subsequent "use" of data (e.g., economic impact, substitution, content generation) rather than the technical act of training itself. He suggested a purpose-based approach for a more expressive vocabulary. * **Counter-arguments:** Martin Iannuzzi (Cloudflare) and Paul Keller (Open Future) highlighted that some content owners have non-economic (e.g., moral rights) objections to their data being used in models, irrespective of the output. * **Vocabulary Granularity:** Concerns were raised by participants (Michael, Alfer; Pedro Ortiz Suarez, Common Crawl Foundation) about the difficulty of creating a static taxonomy that captures evolving concerns and technologies, suggesting that the vocabulary needs to be simple enough for non-technical users. * **Consequences of Preferences:** Martin Iannuzzi (Cloudflare) raised the fundamental question of whether the group should allow people to express preferences regardless of consequences, as long as those preferences are well-expressed. Suresh Krishnan (co-chair) noted the difficulty in defining "substitution" and its varied impact on content providers. * **Avoiding Product-Specific Focus:** A participant (Gary from Google) emphasized avoiding undue focus on one specific product or instantiation to ensure rules work broadly. ## Decisions and Action Items * **Decision:** No formal decisions were made during this session, as polls were intended for gathering information and gauging the sense of the room. * **Action Items:** * **Christian (Alfer):** To prepare a constructive proposal/presentation for a later session, outlining ideas for addressing issues (02:31:00). * **Co-chairs:** To continue guiding discussion on the overall model and high-level issues, including the fundamental question of what it means to express a preference (02:34:00). * **Working Group:** Consider empirical studies to solicit feedback from the broader public on desired preference expressions (Karen, 02:28:03, suggestion). ## Next Steps The meeting adjourned for lunch, with participants encouraged to engage in informal discussions. The group will reconvene at 1:00 PM local time to continue discussions, focusing on the high-level issues raised, particularly Christian's draft and the foundational question of what constitutes a valid and expressible preference. A different MeetEcho link will be used for the afternoon session.