Empower Your Content with New AI Usage Standards

In my experience, the open web often feels like the Wild West, especially in recent times. Many creators, myself included, have watched as our hard work is scraped and fed into large language models without any hint of permission.

This situation has become a free-for-all, leaving website owners with almost no means to opt out or safeguard their creative endeavors. There have been attempts to address this, such as Jeremy Howard’s llms.txt initiative. Much like robots.txt helps us manage site crawlers, llms.txt aims to provide guidelines for AI companies’ crawling bots.

Unfortunately, there’s little proof that AI companies actually respect llms.txt or its guidelines. Additionally, Google has clearly stated it doesn’t support llms.txt.

However, a promising new protocol is on the horizon, potentially granting site owners like myself more control over how AI firms utilize our content. It looks like this might become part of robots.txt, allowing us to set definitive rules around AI system access and usage.

IETF AI Preferences Working Group

In response to this issue, the Internet Engineering Task Force (IETF) began the AI Preferences Working Group earlier this year in January. Their mission is to craft standardized, machine-readable rules to empower site owners to articulate AI usage preferences for their content.

Since its inception in 1986, the IETF has established core Internet protocols like TCP/IP, HTTP, DNS, and TLS. Now, they’re laying down foundations for the open web’s AI era. Leading this group are co-chairs Mark Nottingham and Suresh Krishnan, joined by figures from Google, Microsoft, Meta, and more.

Of particular interest is Google’s involvement via Gary Illyes, who is part of this working group.

The purpose of this group is clear:

“The AI Preferences Working Group will standardize building blocks that allow for expressing preferences about how content is collected and processed for Artificial Intelligence (AI) model development, deployment, and use.”

What the AI Preferences Group is Proposing

This group aims to deliver new standards that empower site owners to determine how LLM-powered systems can utilize their open web content.

A standard track document detailing a vocabulary to express AI-related preferences, independent of content association methods.
Standard track document(s) that explain how to associate these preferences with content using IETF-defined protocols and formats, for example, Well-Known URIs and HTTP response headers.
A standard approach for reconciling multiple preference expressions.

At the time of writing, nothing is set in stone yet. Early documents, however, provide a sneak peek into potential standards.

This working group published two crucial documents in August.

A Vocabulary For Expressing AI Usage Preferences
Associating AI Usage Preferences with Content in HTTP (with Illyes as a contributing author)

These documents propose significant updates to the Robots Exclusion Protocol (RFC 9309), suggesting new rules and definitions enabling site owners to specify AI content usage permissions.

```json
{
"alt": "Diagram showing the relationship between categories of use, including foundation model, AI output, and search under automated processing.",
"caption": "Exploring the links between foundation models, AI outputs, and search within automated processing systems.",
"description": "This diagram illustrates the relationship between various categories in automated processing. It highlights the connections between foundation models, AI outputs, and search functionalities. The depiction consists of labeled boxes arranged to show how these categories interact. This visualization aids in understanding the structure and interaction within automated systems, useful for those studying AI and data processing frameworks."
}
```

How It Might Work

AI systems on the web are categorized and assigned standard labels. Whether a directory will exist for site owners to identify system labels remains unclear.

Currently, the defined labels include:

search: for indexing/discoverability
train-ai: for general AI training
train-genai: for generative AI model training
bots: for all types of automated processing, such as crawling and scraping

For each label, you can set two values:

y to allow
n to disallow.

I found it interesting that these rules can be applied at the folder level and customized for different bots. In robots.txt, they’re implemented using a new Content-Usage field, akin to existing Allow and Disallow fields.

Here’s an example robots.txt that the working group shared in their document:

User-Agent: *
Allow: /
Disallow: /never/
Content-Usage: train-ai=n
Content-Usage: /ai-ok/ train-ai=y

Explanation
Content-Usage: train-ai=n indicates that no content on this domain may be used for training any LLM model, whereas Content-Usage: /ai-ok/ train-ai=y permits model training using content within the /ai-ok/ folder.

Why Does This Matter?

There’s significant buzz about llms.txt within the SEO community and its use alongside robots.txt. Yet, no AI company has confirmed adherence to these guidelines, and Google disregards llms.txt.

Website owners, including myself, crave more explicit control over how AI companies leverage our content—be it for training models or RAG-based responses.

I feel that the IETF’s new standards signify positive progress. With Illyes as a contributing author, I remain optimistic that once finalized, companies like Google will embrace these standards, respecting new robots.txt rules during content scraping.

Inspired by this post on Search Engine Land.

FAQs

What is the IETF AI Preferences Working Group trying to standardize?

The IETF AI Preferences Working Group is working on standardized, machine-readable rules that let site owners express how their content may be collected and processed for AI model development, deployment, and use. The post says the group is exploring building blocks for AI usage preferences and ways to associate those preferences with content.

How is this different from llms.txt?

The post describes llms.txt as an initiative meant to guide AI crawling bots, similar in spirit to how robots.txt manages crawlers. It also notes that there is little proof AI companies respect llms.txt and that Google has stated it does not support it.

What AI usage labels are mentioned in the proposed standards?

The post lists search, train-ai, train-genai, and bots as currently defined labels. These relate to indexing and discoverability, general AI training, generative AI training, and automated processing such as crawling and scraping.

What values can site owners set for each AI usage label?

According to the post, each label can be set to y to allow or n to disallow. The example shows Content-Usage: train-ai=n to disallow training across a domain and Content-Usage: /ai-ok/ train-ai=y to allow training for a specific folder.

Where might AI usage preferences be applied on a website?

The post says the proposed rules may become part of robots.txt through a new Content-Usage field. It also notes that rules could be applied at the folder level and customized for different bots.

Why do these AI usage standards matter to website owners?

The article argues that website owners need clearer control over how AI companies use their content, including for model training and RAG-based responses. The author views the IETF work as positive progress toward more explicit rules for AI access and usage.

Empower Your Content with New AI Usage Standards

IETF AI Preferences Working Group

What the AI Preferences Group is Proposing

How It Might Work

Why Does This Matter?

FAQs

What is the IETF AI Preferences Working Group trying to standardize?

How is this different from llms.txt?

What AI usage labels are mentioned in the proposed standards?

What values can site owners set for each AI usage label?

Where might AI usage preferences be applied on a website?

Why do these AI usage standards matter to website owners?

Comments

Leave a Reply Cancel reply

More posts

7 Best Healthcare Agentic Search Agencies for 2026

6 Best Transportation & Logistics GEO/AEO Agencies for 2026

Google UCP and SEO: How I’m Preparing for AI Commerce

Why Frontloading Ad Spend Backfires—and How I Scale

How I Build a Powerful SEO Budget Case My CFO Can’t Ignore

Meet Pages: My Command Center for Content Performance

How Gemini Intelligence Will Reshape Search and Commerce