From Googlebot to GPTBot: who’s crawling your site in 2025

Web crawlers are not new. The World Wide Web Wanderer debuted in 1993, though the first web search engines to truly use crawlers and indexers were JumpStation and WebCrawler. Crawlers are part of one of the backbones of the Internet’s success: search. Their main purpose has been to index the content of websites across the Internet so that those websites can appear in search engine results and direct users appropriately. In this blog post, we’re analyzing recent trends in web crawling, which now has a crucial and complex new role with the rise of AI.

Not all crawlers are the same. Bots, automated scripts that perform tasks across the Internet, come in many forms: those considered non-threatening or “good” (such as API clients, search indexing bots like Googlebot, or health checkers) and those considered malicious or “bad” (like those used for credential stuffing, spam, or scraping content without permission). In fact, around 30% of global web traffic today, according to Cloudflare Radar data, comes from bots, and even exceeds human Internet traffic in some locations.

A new category, AI crawlers, has emerged in recent years. These bots collect data from across the web to train AI models, improving tools and experiences, but also raising issues around content rights, unauthorized use, and infrastructure overload. We aimed to confirm the growth of both search and AI crawlers, examine specific AI crawlers, and understand broader crawler usage.

This is increasingly relevant with the rapid adoption of AI, growing content rights concerns, and data privacy discussions. Some sites and creators are looking to limit or block AI crawlers using tools like robots.txt or firewall rules. Others, like Dutch indie maker and entrepreneur Pieter Levels, have embraced them: “I’m 100% fine with AI crawlers… very important to rank in LLMs [large language models]”.

It’s important to note that crawlers serve different purposes. For example, the facebookexternalhit bot is not included in this analysis, as it is used by Facebook to fetch page content when generating previews for shared links. However, within this post, we are only focusing on AI and search crawlers that are indexing or scraping website content.

AI-only crawlers perspective

Let’s start with an AI-only crawler perspective that we currently have on Cloudflare Radar, focused only on crawlers advertised as AI-related. To identify them, we’re using here a list derived from an open-source project that helps website owners manage and control access to AI crawlers — especially those used to train large language models (LLMs). It also provides guidance on what to include in robots.txt files (more on that below). The data shown below is based on matching those crawler names with user-agent strings in HTTP requests. (Further details, including one exception, about this method can be found at the end of the blog post.)

The AI crawler landscape saw a significant shift between May 2024 and May 2025, with GPTBot (from OpenAI) emerging as the dominant force, surging from 5% to 30% share, and Meta-ExternalAgent (from Meta) making a strong new entry at 19%. This growth came at the expense of former leader Bytespider, which plummeted from 42% to 7%, as well as other AI crawlers like ClaudeBot and Amazonbot, which also saw declines. Our data clearly indicates a reordering of top AI crawlers, highlighting the increasing prominence of OpenAI and Meta in this category.

May 2024

May 2025

Rank

Bot Name

Share (May 2024)

Rank

Bot Name

Share (May 2025)

1

Bytespider

42%

1

GPTBot

30%

2

ClaudeBot

27%

2

ClaudeBot

21%

3

Amazonbot

21%

3

Meta-ExternalAgent

19%

4

GPTBot

5%

4

Amazonbot

11%

5

Applebot

4.1%

5

Bytespider

7.2%

Rank
Bot Name
Share (May 2024)
Rank
Bot Name
Share (May 2025)

1
Bytespider
42%
1
GPTBot
30%

2
ClaudeBot
27%
2
ClaudeBot
21%

3
Amazonbot
21%
3
Meta-ExternalAgent
19%

4
GPTBot
5%
4
Amazonbot
11%

5
Applebot
4.1%
5
Bytespider
7.2%

For additional context, the list below includes further information about the bots with higher crawling shares seen above. This information comes from the same open-source list mentioned above and from publications by companies like OpenAI, which explain how their crawlers are used. 

  • GPTBot – OpenAI’s crawler used to improve and train large language models like ChatGPT.

  • ClaudeBot – Anthropic’s crawler for training and updating the Claude AI assistant.

  • Meta-ExternalAgent – Meta’s bot likely used for collecting data to train or fine-tune LLMs.

  • Amazonbot – Amazon’s crawler that gathers data for its search and AI applications.

  • Bytespider – ByteDance’s AI data collector, often linked to training models like Ernie or TikTok-related AI.

  • Applebot – Apple’s web crawler primarily for Siri and Spotlight search, possibly used in AI development.

  • OAI-SearchBot – OpenAI’s search-focused crawler, likely used for retrieving real-time web info for models.

  • ChatGPT-User – Represents API-based or browser usage of ChatGPT in connection with user interactions.

  • PerplexityBot – Crawler from Perplexity.ai, which powers their AI answer engine using real-time web data.

Webmasters can inform crawler operators of whether they want these bots and crawlers to access their content by setting out rules in a file called robots.txt, which tells crawlers what pages they should or shouldn’t access. As we’ve seen recently, crawlers honoring your robots.txt policies is voluntary, but Cloudflare announced tools like AI Audit to help content creators to enforce it.

Now, as we’ve seen, the landscape of web crawling is evolving rapidly, driven by the merging roles of search engines and AI. AI is now deeply integrated into search, seen in Google’s AI Overviews and AI Mode, but also in social media platforms, like Meta AI on Instagram. So, let’s broaden our analysis to include these wider AI-driven crawling activities.

General AI and search crawling growth: +18%

A broader view reveals the growth of crawling traffic from both search and AI crawlers over the first few months of 2025. To remove customer growth bias, we’ll analyze trends using a fixed set of customers from specific weeks (a method we’ve used in our Cloudflare Radar Year in Review): the first week of May 2024, a week in November 2024, and the first week of April 2025. 

Using that method, we found that AI and search crawler traffic grew by 18% from May 2024 to May 2025 (comparing full-month periods). The increase was even higher, at 48%, when including new Cloudflare customers added during that time. Peak AI and search crawling traffic occurred in April 2025, with a 32% increase compared to May 2024. This confirms that crawling traffic has clearly risen over the past year, but also that growth is not always constant. Google remains the dominant player, and its share is growing too, as we’ll see in the next section.

As the next chart shows, crawling traffic increased sharply in March and April 2025 and remained high, though slightly lower, in May.

The patterns on the above crawling chart also seem to reflect broader seasonal patterns and general human Internet traffic patterns. In 2024, traffic dropped during the summer in the Northern Hemisphere, with August and September being the least active months. And like overall Internet traffic, it then rose in November, when people are typically more online due to shopping and seasonal habits, as we’ve seen in past analyses

Googlebot crawling grew 96% in one year

Googlebot, which indexes content for Google Search, was clearly the top crawler throughout the period and showed strong growth, up 96% from May 2024 to May 2025, reflecting increased crawling by Google. Crawling traffic peaked in April 2025, reaching 145% higher than in May 2024. It’s also important to mention that Google made changes to its search and launched AI Overviews in its search engine during this time — first in the US in May 2024, then in more countries later.

Two trends stand out when looking at daily data for Google-related crawlers, as shown in the graph below. First, Googlebot and the more recent GoogleOther (a web crawler from 2023 for “research and development”) account for most of Google’s crawling activity. Second, there were two visible drops in crawling traffic: one on December 14, 2024 (around a Google Search update), and another from May 20 to May 28, 2025. That May 20 drop occurred around the same time as the rollout of AI Mode on Google Search in the US, although the timing may be coincidental.

Breakdown of top 20 AI and search web crawlers 

Ranking crawlers by their share of total requests gives a clearer picture of which bots are gaining or losing ground, especially among those focused on search and AI. The table below shows a clear trend: some AI bots have grown rapidly since last year (with growth beginning even earlier), while many traditional search crawlers have remained flat or lost share (as in the case of Bing and its Bingbot crawler). The main exception is Googlebot.

The next table shows the percentage share of each crawler out of all crawling traffic generated by this specific cohort of over 30 AI & search crawlers observed by Cloudflare in May 2024 and May 2025. The table below also includes the change in percentage points and the growth or decline in raw request volume. Crawlers are ranked by their share in May 2025. Key crawler shifts include GPTBot rising sharply (+305%), while Bytespider dropped dramatically (-85%).

Rank

Bot name

Share
May 2024

Share
May 2025

Δ percentage-point change

Raw requests growth (May 2024 to May 2025)

1

Googlebot

30%

50%

+20 pp

96%

2

Bingbot

10%

8.7%

-1.3 pp

2%

3

GPTBot

2.2%

7.7%

+5.5 pp

305%

4

ClaudeBot

11.7%

5.4%

-6.3 pp

-46%

5

GoogleOther

4.4%

4.3%

-0.1 pp

14%

6

Amazonbot

7.6%

4.2%

-3.4 pp

-35%

7

Googlebot-Image

4.5%

3.3%

-1.2 pp

-13%

8

Bytespider

22.8%

2.9%

-19.8 pp

-85%

9

Yandex

2.8%

2.2%

-0.7 pp

-10%

10

ChatGPT-User

0.1%

1.3%

+1.2 pp

2,825%

11

Applebot

1.9%

1.2%

-0.7 pp

-26%

12

Timpibot

0.3%

0.6%

+0.3 pp

133%

13

Baiduspider

0.5%

0.4%

-0.1 pp

7%

14

PerplexityBot

Source:: CloudFlare