Published on
Zach Jackson

If you’ve seen the recent chatter around robots.txt updates and new files like ai.txt and llms.txt, you’re not alone. Many marketers and site owners are wondering whether they need to take action to protect their content from AI crawlers, or if this is yet another technical storm in a teacup.

The short answer? For most brands and businesses right now, there’s little to worry about. But it’s worth understanding what’s changing, what’s hype, and where things might be heading.

A quick refresher: What robots.txt actually does

Every website has the option to include a robots.txt file — a simple text file that tells search engines which parts of a site can be crawled or indexed. It’s been around for decades and remains an important way to help search engines focus on the right pages, reduce server strain, and protect sensitive areas of a site.

Traditionally, this file has guided legitimate crawlers such as Googlebot or Bingbot. But the landscape has shifted with the rise of AI models and generative search tools that scrape content not just to index it, but to train on or summarise it. These newer systems don’t always follow the old rules, which is where the confusion starts.

ai.txt and llms.txt - New instructions for AI crawlers

In response to crawlers from AI systems such as ChatGPT, Perplexity, and Google’s own AI Overviews and AI Mode, the industry has begun experimenting with new text files to set rules when they visit your website. These are primarily ai.txt and llms.txt.

ChatGPT is about to hit mass adoption - Find out how people are using it.

In theory, these files act like an expanded robots.txt, letting site owners say how (or if) they want their content used by AI systems. For example, they might specify whether a site’s content can be used for AI training, summarisation, or search results.

In practice, though, both are still very early stage. There’s no universal agreement on how AI systems should interpret them, and compliance is entirely voluntary. Some AI crawlers ignore these signals altogether, meaning they may not offer much real protection yet.

Here’s a brief comparison of the two new text files:

  • llms.txt was designed to help AI tools understand or cite content more accurately, but maintaining it can be complex. There’s also a risk of unintentionally exposing more information than intended.
  • ai.txt is simpler and is emerging as the more likely long-term option, but it’s not yet consistently recognised by major platforms.

At the moment, implementing either doesn’t deliver SEO or visibility benefits — and may introduce more work than value for most sites.

Cloudflare’s “Content Signals” — a step, not a solution

You might have seen Cloudflare mentioned in discussions around AI crawling. Cloudflare is a widely used web security and performance platform that sits between a website and the wider internet. It helps protect sites from attacks, improve load speed, and manage how different types of traffic reach your server.

Because Cloudflare already plays a central role in how many websites handle bots and crawlers, it has recently introduced a feature called Content Signals, which builds on robots.txt to help communicate how site content should be used for AI training or discovery. It’s a helpful innovation, especially given concerns about aggressive crawlers like Perplexity AI.

However, this too relies on voluntary compliance. Crawlers that already ignore robots.txt, such as Perplexity’s aggressive crawlers, are unlikely to start respecting these new tags. So, while it’s a welcome development, it’s more of a signal of intent than a guaranteed safeguard.

The main purpose of Cloudflare’s “Content Signals” feature

Cloudflare’s main target with their new feature is Google, or more specifically, Google’s AI Overviews. This is due to the fact that there is currently no way of blocking AI Overviews from accessing and using your content without also removing your content from Google’s main search index.

This would mean your content wouldn’t show up in Google’s traditional search results (the 10 blue links), not on page 1, 2, 3 or anywhere.

Cloudflare’s goal is to give website owners the option to say to Google:

“We want our content indexed, but we’re not OK with our content being used by or in AI Overviews.”

Cloudflare is confident that Google and other AI providers will cooperate, but for now, the standard is not enforceable.

Learn more about Cloudflare’s efforts in our Q3 “Marketing Matters” roundup of essential marketing news.

TDMP recommendations regarding text files for AI crawlers

At TDMP, we’ve reviewed the current landscape carefully, and our view is straightforward:

For most brands, there’s no urgent need to overhaul your setup or rush into new AI standards. Here’s what’s worth doing now — and what isn’t:

  • Do make sure your site’s structured data and metadata are complete and accurate. This remains the best way to help both search engines and AI systems understand your content.
  • If you’re using Cloudflare, consider enabling the new Content Signals — it’s an easy, low-risk way to declare your AI usage preferences publicly.
  • Keep an eye on ai.txt developments, but there’s no need to implement it until there’s widespread adoption.
  • Avoid llms.txt for now. It’s complex, offers little tangible benefit, and could even expose your content unnecessarily.

Who should get more serious about implementing text files for AI crawlers?

If you’re running a large content platform, news publisher, or organisation whose material is frequently repurposed by AI tools, you may want to explore these options sooner.

But for the vast majority of marketing websites, ecommerce sites, and brand pages, your best move right now is to focus on the fundamentals: keep your site technically sound, structured data up to date, and content high quality.

The bottom line

AI crawlers and new file standards like ai.txt are interesting developments — and they’ll likely become more relevant as adoption matures. But today, they’re largely experimental.

For now, sticking with a clean robots.txt file, solid structured data, and good site hygiene remains the most effective and supported approach. TDMP will continue to track this evolving space and share updates as it becomes clearer which standards truly matter.

Contact us today for comprehensive digital marketing support.

Keep your finger on the TDMPulse

Sign up to our newsletter for monthly insights, news & guides