Is Your WordPress Blocking AI Bots? Discover the Hidden Barriers

```json
{
  "alt": "Illustration of robots representing Amazonbot, ClaudeBot, GPTBot blocked by a WordPress shield, while ChatGPT-User and PerplexityBot pass through.",
  "caption": "Blocked and Approved: Illustration shows which bots WordPress shields and allows.",
  "description": "This image features a digital illustration where three blue robots labeled Amazonbot, ClaudeBot, and GPTBot are stopped by a symbolic WordPress shield, indicated by red X marks. In contrast, two orange robots labeled ChatGPT-User and PerplexityBot successfully pass through, denoted by green check marks. The image highlights a security and permission concept, emphasizing trusted and restricted access through website protection. Keywords include WordPress, robots, security, access, and digital illustration."
}
```

When I first looked at my SEO data, everything seemed perfectly fine. All metrics from Google Search Console, traffic, and indexing were normal without any red flags. But then, I decided to dig deeper using Scrunch, our AI citation monitoring tool, to examine the platform presence for searchinfluence.com over the past 30 days.

Here’s what I found: Google AI Mode showed a presence of 37.8%, Copilot at 22.2%, Google Gemini at 16.3%, ChatGPT at 9.6%, and Perplexity at 7.8%. Alarmingly, both Claude and Meta AI were at 0.0%.

```json
{
  "alt": "Bar chart showing rate-limiting of AI training crawlers vs. user-facing crawlers. Amazonbot leads with 51% throttling.",
  "caption": "AI training crawlers like Amazonbot face significant throttling, with up to 51% rate-limiting, unlike user-facing crawlers.",
  "description": "This chart illustrates the percentage of HTTP 429 rate-limiting experienced by AI training crawlers versus user-facing crawlers from April 4-10, 2026. Amazonbot is most heavily throttled at 51%, while ClaudeBot and GPTBot both face 29% throttling. PerplexityBot and ChatGPT-User encounter no rate-limiting. The data is sourced from Cloudflare GraphQL Analytics via searchinfluence.com, excluding Bytespider."
}
```

Two platforms had zero presence. Given that every crawler reads the same site, differences in content quality or topical authority couldn’t explain this discrepancy. The only factor that varied was crawler access.

```json
{
  "alt": "The CapmatchOne logo with a gradient circle and bold text.",
  "caption": "Discover innovation with the CapmatchOne logo, featuring sleek typography and a modern gradient circle.",
  "description": "The CapmatchOne logo features bold, modern typography coupled with a gradient circle, symbolizing connection and innovation. The sleek design conveys a sense of progress and creativity. This image can be used for branding or promotional purposes, appealing to audiences interested in innovative solutions and forward-thinking designs."
}
```

To understand this further, I analyzed seven days of Cloudflare logs and discovered 29,099 bot requests, with 65.8% involving AI bots. The requests rate-limited with HTTP 429, or “too many requests,” were interestingly varied by bot user-agent.

```json
{
  "alt": "Flowchart showing request path for ClaudeBot/GPTBot with focus on where 429 error fires.",
  "caption": "Unraveling the mystery of the 429 error, this infographic visually maps the request path for ClaudeBot/GPTBot and reveals the platform level where issues arise.",
  "description": "This flowchart details the request path for ClaudeBot/GPTBot/Amazonbot through Cloudflare, WP Engine Edge, and WordPress Origin. It highlights that the 429 error fires at the WP Engine Edge level, which is not visible to customer dashboards and lacks documented opt-out. The chart illustrates stages of the request process and their controllability, emphasizing the point of error data for developers and SEO analysts."
}
```

Training crawlers that make bulk requests are throttled, while user-facing crawlers that mimic human pacing during live queries aren’t. For example, ClaudeBot made 20,583 crawl requests for each referral returned.

```json
{
  "alt": "Bar graph showing block rates of AI bots by user-agent.",
  "caption": "This chart reveals selective blocking of AI bots by their user-agents, with some completely blocked while others are allowed.",
  "description": "The image presents a bar graph depicting the block rate of various AI bots by user-agent on searchinfluence.com as of April 2026. Amazonbot, ClaudeBot, and Bytespider are 100% blocked, while GPTBot is 80% blocked. CCBot and anthropic-ai show 0% block rate. The graph highlights selective blocking, where some user-agents face significant access restrictions, while others pass without blocks. Keywords: AI bots, user-agent, block rate, HTTP response."
}
```

My assumption was that the 429 errors originated from Cloudflare, perhaps due to a web application firewall (WAF) or security plugin interference. I went down a rabbit hole investigating multiple layers. It was time-consuming and ultimately unnecessary.

```json
{
  "alt": "Bar chart comparing bot crawl success rate and AI citation presence across four platforms.",
  "caption": "Exploring bot crawl success versus AI citation presence: Google and Perplexity excel, while ChatGPT and Claude face challenges.",
  "description": "This bar chart presents a comparison between bot crawl success rate and AI citation presence for four platforms: Google AI Mode Googlebot, ChatGPT GPTBot, Perplexity PerplexityBot, and Claude ClaudeBot. Google and Perplexity show 100% crawl success, but only Google achieves significant citation presence at 37.8%. ChatGPT and Claude face lower citation visibility. Data from Cloudflare GraphQL Analytics and Scrunch AI highlight the discrepancies between access and citation outcomes."
}
```

The truth emerged when I performed a reproduction test using curl requests, revealing that the block was based on user-agent, not path or rate. The realization hit when I discovered the x-powered-by header: WP Engine hosted our site, and the block came from their platform infrastructure.

I then tested other AI bot UAs and crafted a fingerprint for each, discovering that the blocklist was outdated. While some bots were blocked, others like Common Crawl passed through unaffected.

In conclusion, while WP Engine’s firewall, documented on their support page, was intended as a security measure, it wasn’t transparent to customers. Identifying these blocks requires specific diagnostic steps, and the process taught me much about managed hosting’s hidden layers.


Inspired by this post on Search Engine Land.


crushpress.ai community screenshot

FAQs

What caused the AI bot block on your WordPress site?

The block originated from WP Engine’s firewall infrastructure, triggered by the crawler’s user-agent rather than the path or rate. A reproduction test with curl confirmed the user-agent-based block.

What did the Cloudflare logs reveal about bot requests?

Cloudflare logs showed 29,099 bot requests over seven days, with 65.8% involving AI bots. HTTP 429 rate limits varied by bot user-agent.

Which AI platforms showed zero presence in the analysis?

Claude and Meta AI had 0.0% presence in the data.

What did the author conclude about WP Engine's firewall?

It is a security measure, but not transparent to customers. Diagnosing blocks required specific steps and revealed hidden layers in managed hosting.

What steps did the author take to investigate blocking?

They tested other AI bot user-agents and fingerprinted each; discovered the blocklist was outdated. Some bots were blocked while others, like Common Crawl, passed.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *