I recently discovered some updates that Google made to its help documents, clarifying the file limits for Googlebot’s crawling abilities. They shared insights about how much data Googlebot can process for different file types.
In these updates, Google specified the limits for crawling by file type, some of which continue from previous guidelines and aren’t entirely new. These updates cover:
15MB for web pages: According to Google, by default, their crawlers only process the first 15MB of a file. This means any content beyond that limit gets ignored.
64MB for PDF files: When it comes to PDFs, Googlebot has a larger limit, crawling up to the first 64MB. This applies when Googlebot indexes PDFs in Google Search.
2MB for supported file types: Googlebot processes the first 2MB of other supported file types, along with the 64MB limit for PDFs.
Rest assured, these limits are pretty generous, meaning most websites won’t be affected or even reach these thresholds.
Google’s documentation explains, “By default, Google’s crawlers only process the first 15MB of a file. Individual projects may have different limits, and they might differentiate between file types, providing larger limits for PDFs compared to HTML.”
Furthermore, the data beyond the specified limit doesn’t get indexed as Googlebot halts the fetch after the limit is reached. This applies to all resources referenced in the HTML, like CSS and JavaScript, except PDFs.
Why should we care? Knowing these limits can enhance your website’s SEO strategy, even though most won’t come close to these limits. Still, it’s vital to be aware of the boundaries set for Googlebot’s crawling.
Inspired by this post on Search Engine Land.


Leave a Reply