Tag: Cloudflare

  • Is Your WordPress Blocking AI Bots? Discover the Hidden Barriers

    Is Your WordPress Blocking AI Bots? Discover the Hidden Barriers

    When I first looked at my SEO data, everything seemed perfectly fine. All metrics from Google Search Console, traffic, and indexing were normal without any red flags. But then, I decided to dig deeper using Scrunch, our AI citation monitoring tool, to examine the platform presence for searchinfluence.com over the past 30 days.

    Here’s what I found: Google AI Mode showed a presence of 37.8%, Copilot at 22.2%, Google Gemini at 16.3%, ChatGPT at 9.6%, and Perplexity at 7.8%. Alarmingly, both Claude and Meta AI were at 0.0%.

    ```json
{
  "alt": "Bar chart showing rate-limiting of AI training crawlers vs. user-facing crawlers. Amazonbot leads with 51% throttling.",
  "caption": "AI training crawlers like Amazonbot face significant throttling, with up to 51% rate-limiting, unlike user-facing crawlers.",
  "description": "This chart illustrates the percentage of HTTP 429 rate-limiting experienced by AI training crawlers versus user-facing crawlers from April 4-10, 2026. Amazonbot is most heavily throttled at 51%, while ClaudeBot and GPTBot both face 29% throttling. PerplexityBot and ChatGPT-User encounter no rate-limiting. The data is sourced from Cloudflare GraphQL Analytics via searchinfluence.com, excluding Bytespider."
}
```

    Two platforms had zero presence. Given that every crawler reads the same site, differences in content quality or topical authority couldn’t explain this discrepancy. The only factor that varied was crawler access.

    ```json
{
  "alt": "The CapmatchOne logo with a gradient circle and bold text.",
  "caption": "Discover innovation with the CapmatchOne logo, featuring sleek typography and a modern gradient circle.",
  "description": "The CapmatchOne logo features bold, modern typography coupled with a gradient circle, symbolizing connection and innovation. The sleek design conveys a sense of progress and creativity. This image can be used for branding or promotional purposes, appealing to audiences interested in innovative solutions and forward-thinking designs."
}
```

    To understand this further, I analyzed seven days of Cloudflare logs and discovered 29,099 bot requests, with 65.8% involving AI bots. The requests rate-limited with HTTP 429, or “too many requests,” were interestingly varied by bot user-agent.

    ```json
{
  "alt": "Flowchart showing request path for ClaudeBot/GPTBot with focus on where 429 error fires.",
  "caption": "Unraveling the mystery of the 429 error, this infographic visually maps the request path for ClaudeBot/GPTBot and reveals the platform level where issues arise.",
  "description": "This flowchart details the request path for ClaudeBot/GPTBot/Amazonbot through Cloudflare, WP Engine Edge, and WordPress Origin. It highlights that the 429 error fires at the WP Engine Edge level, which is not visible to customer dashboards and lacks documented opt-out. The chart illustrates stages of the request process and their controllability, emphasizing the point of error data for developers and SEO analysts."
}
```

    Training crawlers that make bulk requests are throttled, while user-facing crawlers that mimic human pacing during live queries aren’t. For example, ClaudeBot made 20,583 crawl requests for each referral returned.

    ```json
{
  "alt": "Bar graph showing block rates of AI bots by user-agent.",
  "caption": "This chart reveals selective blocking of AI bots by their user-agents, with some completely blocked while others are allowed.",
  "description": "The image presents a bar graph depicting the block rate of various AI bots by user-agent on searchinfluence.com as of April 2026. Amazonbot, ClaudeBot, and Bytespider are 100% blocked, while GPTBot is 80% blocked. CCBot and anthropic-ai show 0% block rate. The graph highlights selective blocking, where some user-agents face significant access restrictions, while others pass without blocks. Keywords: AI bots, user-agent, block rate, HTTP response."
}
```

    My assumption was that the 429 errors originated from Cloudflare, perhaps due to a web application firewall (WAF) or security plugin interference. I went down a rabbit hole investigating multiple layers. It was time-consuming and ultimately unnecessary.

    ```json
{
  "alt": "Bar chart comparing bot crawl success rate and AI citation presence across four platforms.",
  "caption": "Exploring bot crawl success versus AI citation presence: Google and Perplexity excel, while ChatGPT and Claude face challenges.",
  "description": "This bar chart presents a comparison between bot crawl success rate and AI citation presence for four platforms: Google AI Mode Googlebot, ChatGPT GPTBot, Perplexity PerplexityBot, and Claude ClaudeBot. Google and Perplexity show 100% crawl success, but only Google achieves significant citation presence at 37.8%. ChatGPT and Claude face lower citation visibility. Data from Cloudflare GraphQL Analytics and Scrunch AI highlight the discrepancies between access and citation outcomes."
}
```

    The truth emerged when I performed a reproduction test using curl requests, revealing that the block was based on user-agent, not path or rate. The realization hit when I discovered the x-powered-by header: WP Engine hosted our site, and the block came from their platform infrastructure.

    I then tested other AI bot UAs and crafted a fingerprint for each, discovering that the blocklist was outdated. While some bots were blocked, others like Common Crawl passed through unaffected.

    In conclusion, while WP Engine’s firewall, documented on their support page, was intended as a security measure, it wasn’t transparent to customers. Identifying these blocks requires specific diagnostic steps, and the process taught me much about managed hosting’s hidden layers.


    Inspired by this post on Search Engine Land.


    crushpress.ai community screenshot
  • AI Bots Could Dominate Internet Usage by 2027

    AI Bots Could Dominate Internet Usage by 2027

    I recently heard Cloudflare CEO Matthew Prince predict a fascinating future where AI bots might outnumber us humans on the web by 2027. The surge of agent-driven browsing, paired with the rise of generative AI, could really shake things up online.

    During his talk at SXSW, Prince warned us that bots are already transforming how we use and monetize the internet. This got me thinking about the big shift in search as more people rely on AI-generated answers instead of traditional clicks.

    Why this matters to me. With the prospect of bots becoming the main users of the web, I’ll need to adapt my strategy. Ensuring AI systems can access and trust my content will be crucial for staying relevant.

    Details from Prince. According to Prince, AI agents collect far more information than we do because of their unique browsing habits. While I might visit five sites for a purchase, an AI could browse thousands, generating significant traffic and load.

    Prince also pointed out the rapid changes in the internet’s baseline.

    He said that, for a long time, about 20% of web traffic was from bots, but by 2027, this could surpass human traffic.

    This isn’t a sudden spike, like during COVID-19; it’s a steady increase with no signs of slowing down.

    The broader implications. Prince compared this shift to other digital transformations, like mobile and social media. However, the difference here is profound: users may stop visiting websites directly, relying instead on AI interfaces for aggregated answers.

    The traditional business model of attracting traffic and selling through ads is under threat. After all, bots don’t click on ads, and customers are more likely to trust an AI’s output without further clicks.

    AI sandboxes. I found Prince’s vision of “AI sandboxes” particularly intriguing. These temporary environments for AI agents could appear and disappear millions of times per second, impacting how computing works behind the scenes.

    Such changes will undoubtedly put sustained pressure on our internet infrastructure as traffic continues to grow.

    Business ramifications. Companies are already debating how to adapt to AI’s influence, and there’s no clear consensus yet. Prince highlighted how the nature of bots might sever the direct relationship between businesses and their customers, as bots don’t prioritize brands.

    For content creators like me. AI can be both a challenge and an opportunity. It might reduce direct traffic, challenging ad-based models, but it also creates demand for unique, original data, which AI companies may pay for.

    Local media could thrive by licensing specific content to AI companies, potentially earning more than through digital ads.

    For small businesses. Prince put it wisely: AI agents prioritize price, quality, and efficiency over brand loyalty. This means traditional trust shortcuts might not hold any longer, driving towards relentless aggregation.

    Future considerations. The next era hinges on finding ways to balance control and compensation for content producers and providers. In Prince’s words, “There has to be some exchange of value.”

    The fundamental question remains unanswered: what will be the future business model of the internet?

    For more insights, check out the SXSW interview: The Internet After Search.


    Inspired by this post on Search Engine Land.


    crushpress.ai community screenshot
  • Cloudflare’s Markdown Feature: A Game Changer or a Cloaking Risk?

    Cloudflare’s Markdown Feature: A Game Changer or a Cloaking Risk?

    Yesterday, I stumbled upon some exciting news from Cloudflare. They’ve introduced a feature called Markdown for Agents, which provides machine-friendly versions of web content alongside the traditional pages we all see.

    Cloudflare describes this update as a proactive measure in response to increasing AI crawler activities and agentic browsing.

    When a client requests text/markdown, Cloudflare fetches the HTML from the origin server, converts it right at the edge, and then hands over a Markdown version.

    Interestingly, the response includes a token estimate header, which helps developers like me manage context windows more effectively.

    Early feedback highlighted not only the efficiency gains but also the potential implications of offering alternate representations of web content.

    What’s happening. Being part of the 20% of the web that Cloudflare powers, I learned that Markdown for Agents utilizes standard HTTP content negotiation. If a client sends an Accept: text/markdown header, Cloudflare immediately converts the HTML response on-the-fly to Markdown format. The response, marked with Vary: accept, ensures caches store separate versions.

    Cloudflare views this opt-in feature as a shift in content discovery and consumption, benefitting AI crawlers and agents with its structured text that requires less overhead.

    They claim Markdown can reduce token usage by up to 80% compared to HTML, which is quite impressive!

    Security concern. SEO consultant David McSweeney raised a concern, citing that Cloudflare’s Markdown for Agents feature might make AI cloaking incredibly simple because the Accept: text/markdown header tips off origin servers that the request is AI-related.

    Regular requests deliver the usual content, but those for Markdown can trigger a unique HTML response that gets converted for AI consumption, McSweeney explained on LinkedIn.

    The worry is that sites might inject hidden instructions, altered product data, or other machine-only content, creating a hidden “shadow web” for bots, unless the header is stripped before reaching the origin.

    Google and Bing’s markdown smackdown. Here’s the kicker. Representatives from Google and Microsoft advised against creating separate markdown pages for large language models. Google’s John Mueller noted:

    “Given that LLMs have always trained on and parsed normal web pages, it seems obvious they have no issues with HTML. Why serve a page that no end user sees? Plus, if they validate equivalence, why not stick to HTML?”

    Microsoft’s Fabrice Canel added:

    “Do you really want to double crawl load? We’ll check for similarity anyway. Non-user versions (like crawlable AJAX) are often neglected and broken. Human oversight fixes both user and bot views. Schemas help, and AI makes us even better at deciphering web pages. Less is more in SEO!”

    Cloudflare’s feature doesn’t generate another URL but does create varied representations based on request headers.

    The case against markdown. Technical SEO consultant Jono Alderson pointed out that once a machine-targeted representation exists, platforms must choose to trust it, verify it against the human version, or outright ignore it:

    “Flattening a page to markdown doesn’t only remove clutter. It strips away judgment and context.”

    “The instant you publish a machine-exclusive page representation, you craft a secondary candidate version of reality. Regardless of source promises or claims of identical content, a system now views two representations and must determine the true reflection of the page.”

    Dig deeper. Why LLM-only pages aren’t the answer to AI search

    Why we care. With Cloudflare’s advancements, AI ingestion might become more cost-effective and streamlined. But does serving distinct content to humans and crawlers verge on cloaking? Stay tuned…


    Inspired by this post on Search Engine Land.


    crushpress.ai community screenshot
  • Boost Your Website’s AI Visibility: Overcome Crawling Hurdles

    Boost Your Website’s AI Visibility: Overcome Crawling Hurdles

    Have you ever wondered why your site isn’t getting the attention it deserves from AI crawlers? I know how frustrating it can be to feel overlooked in the digital world. Often, Cloudflare might be the culprit blocking access.

    Let me guide you through diagnosing these issues, providing solutions, and optimizing your site for better LLM (Large Language Model) visibility. Together, we’ll ensure your site is primed for the AI-age and ready to capture its rightful place in search rankings.


    Inspired by this post on HiGoodie Blog.


    crushpress.ai community screenshot