Header Text - How AI Crawlers Can Affect Site Speed and Performance

AI Crawlers Slowing Down Websites: What You Need To Know

Artificial Intelligence (AI) is changing how we use the internet, but this is starting to come at a cost for some website owners, as AI crawlers are starting to bombard sites with mind-boggling amounts of hits in their quest to gather as much data as possible. As you can imagine, this takes a toll on bandwidth and website performance. This blog will explain how these AI bots work and how to spot when your site is being scraped like the last bit of peanut butter out of the jar. We’ll also show you how, with the right tools and Web Hosting, you can help keep your site running as it should.

KEY TAKEAWAYS

  • AI crawlers can hit websites with high-volume traffic to gather data, consuming bandwidth and reducing performance.
  • AI crawlers are a type of bot that collects data from the web to train AI models.
  • Search engine crawlers are designed to drive site traffic. AI crawlers scrape content to feed LLMs, negatively impacting site traffic and SEO.
  • Preventing AI crawler slowdowns includes using robots.txt and llms.txt files, setting firewall rules, using a CDN, or third-party bot management tools.
  • Domains.co.za’s CloudLinux integration isolates websites, ensuring performance is not affected by other sites on the same server.
  • Upgrading to VPS or Managed cPanel Web Hosting gives your site its own resources, protecting it from traffic spikes.

What are AI Crawlers?

Instead of using Google and other search engines to find information like in the good old days, more and more people are turning to generative AI instead, which also uses crawlers and bots to get answers.

Before we get into the ins and outs of why AI crawlers can slow down your site, it’s a good idea to understand bots and what they are used for.

In a nutshell, bots are automated software applications and scripts that perform a range of jobs across the Internet. They are designed to do repetitive tasks faster and more efficiently than humans. There are good ones, like search engine crawlers, and bad; like those used for spam and DDoS (Distributed-Denial-of-Service) attacks.

Strip Banner Text - AI crawlers are bots that gather website data to train LLMs

AI crawlers are a new type of bot that continuously collects information from across the web to create massive datasets to train their respective AI models.

Compared to traditional bots, AI crawlers are more sophisticated as they are designed to understand and interpret content in a human-like way. This includes extracting information from text, images, and videos, with some even processing dynamic content rendered with JavaScript.

AI crawlers, in this context, technically fall into the “bad” category. This is because, unlike search engine crawlers, which index content, these bots can generate massive volumes of traffic that repeatedly access pages.

This can happen thousands of times a second, chewing up terabytes of bandwidth, straining server resources and leading to slow site performance and even downtime. This can lead to lost traffic, lower SEO, and fewer conversions.

Here’s a fun statistic for you: training now makes up just under 78% of AI bot activity.

Search Engine vs AI Bots

Search engines, as we know, are designed to drive traffic to websites through the links listed in results pages after you type in a query. Traditional search engine bots are “benign” and crawl websites to index your content. This means they scan, process, and store your site’s content to understand it and display the link in relevant search query results.

They also have limits in place for the number of requests that can be made at a time to avoid overwhelming servers and websites.

The AI crawlers, used by Large Language Models (LLMs) like Chat GPT and Perplexity for training, on the other hand, scrape massive amounts of website content to feed into their respective LLMs. However, they send far fewer visitors to the sites from which they get the information, without the common decency to even offer a reach around.

Some of the most active and biggest ones are:

  • GPTBot from OpenAI, the creator of ChatGPT
  • ClaudeBot from Anthropic, the company behind the Claude AI assistant.
  • PerplexityBotfrom Perplexity AI, which runs its search and answering service.
  • Google-Extended, used for Google’s AI products, like the Gemini assistant.
  • Bytespider, operated by ByteDance, TikTok’s parent company.

What this Could Mean for your Website

Studies from Pew Research Centre have already pointed to the AI overviews shown at the top of Search Engine Results Pages (SERPs) are contributing to a marked decline in site traffic because people get the answer they want right away, instead of having to visit a website.

This could mean lots of bot traffic hits, but far fewer people going to your site, leading to lower Click-Through Rates (CTRs) and fewer conversions. It can also skew important metrics like pageviews, bounce rates, and engagement because, although it looks like you’re getting a lot of traffic, it’s not coming from humans.

To highlight how much traffic they account for, according to Cloudflare, 30% of global web traffic now comes from bots.

Cloudflare CEO Matthew Prince, speaking at an Axios event in Cannes, June 2025, stated: “Traditionally, for every six times Google crawled a website, one person might visit and potentially view ads. In contrast, the rate was about 250 to 1 with OpenAI’s crawlers and 6,000 to 1 with Anthropic. Today, Cloudflare’s CEO estimates that Google’s crawl-to-visitor rate has declined to 18 to 1, OpenAI’s has worsened to 1,500 to 1, and Anthropic’s is approximately 60,000 to 1.”

Many AI crawlers used for training, as we pointed out earlier, are much more aggressive, and that’s where the problem comes in. These crawlers have been known to request large batches of web pages in short, high-volume bursts, often making thousands of requests a minute. This then hits sites with intense, sudden traffic spikes, because they tend to run at such a massive rate.

As mentioned earlier, because they are designed to read a page like a human would, they extract full page HTML text, metadata, and media (images, videos, etc.). Newer versions can even attempt to follow links or execute JavaScript and render pages the same way a visitor would. This, as you can imagine, chews up a huge amount of server resources, causing slow performance and even downtime.

Following that, despite claims from the above AI companies, many training bots have been known to ignore crawl delays in robots.txt directives (which dictate what can and can’t be crawled on a page) and the bandwidth-saving guidelines followed by standard search engine crawlers.

How to Stop AI Crawlers from Accessing your Website

If you’ve noticed that AI crawlers are using up a lot of bandwidth and resources, resulting in massive traffic spikes, slow page speeds, and high bounce rates, there are a few things you can do to stem the tide.

While you can’t stop all bots, here are four methods to help reduce their impact and keep them off your site as much as possible.

Robots.txt and LLMS.txt Files

The robots.txt file tells bots what they can and can’t do on your site. It is a standard, simple text file that most good bots, including those from major search engines and AI companies like OpenAI and Google, will respect.

To disallow all AI bots from crawling your site for training purposes, you can add a rule to your robots.txt file in your website’s root directory. For example:

User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /

This tells the crawlers associated with OpenAI and Anthropic not to crawl your site. You can find a list of user agents for various AI bots online. However, not all AI crawlers obey these rules and often ignore them.

There’s also a newer file type, llms.txt, coming into play. Although not widely used at the moment, this option can provide an alternative for defining content usage for LLM training crawlers.

The principle is generally the same as robots.txt, which focuses on crawl behaviour in general, but the main difference is that the llms.txt file defines permissions specifically for AI behaviour and data use.

Using both files together can give you more control over managing crawler traffic, especially as new bots appear and training models evolve.

Strip Banner Text - Slowdowns are caused by thousands of bot requests consuming bandwidth

Setting Firewall Rules

For bots that ignore robots.txt rules, a Web Application Firewall (WAF) can stop them at the server level.  

This involves creating firewall rules that identify and block requests from known AI crawler names (e.g., GPTBot, PerplexityBot) or suspicious IP address ranges. To prevent AI scrapers from accessing your site altogether, you can block access using firewall rules. With this approach, the server will drop traffic from known AI networks.

Another one is rate limiting, which restricts the number of requests a single IP address can make in a given timeframe. For example, you can set a rule to block an IP if it makes more than 50 requests per minute, which is a common characteristic of aggressive bots.

However, this method isn’t for everyone. If you are on a shared hosting plan, you won’t have access to server settings as the hosting provider manages the servers and does not allow access to customers.

Even assuming you do have access, it can also be risky, as a misconfigured rule can lock you out of your own server or block legitimate traffic.

Use a Content Delivery Network

A Content Delivery Network (CDN) can be a good way of absorbing and mitigating the high-volume traffic from AI crawlers. A CDN works by caching static content (like images, CSS, and JavaScript files) on a network of servers located around the world.

When an AI bot (or human visitor) requests content, the CDN serves it from the one closest to their location, rather than having to fetch it from your origin server every time, thus reducing server strain as well as bandwidth consumption.

This method can work; however, the CDN will have to deal with thousands of IPs across the world. This makes them a moving target, as new bots emerge on new networks and existing ones change IPs.

AI Bot Blocking Tools

For a more advanced approach, third-party services and tools are available for managing bots compared to configuring firewall rules, especially for beginners without server access.

They use a combination of behavioural analysis and traffic matching patterns to differentiate between benign crawlers and those that could cause performance issues by analysing request patterns.

For example, Cloudflare’s Bot Management includes Bot Fight Mode, which can identify AI crawlers and allows you to implement rules to allow good bots to do their thing while blocking the bad ones from accessing your pages.

At the same time, you can get reports and analytics of what type of bot traffic is hitting your site, where they’re coming from, and the resources they are consuming.

Finally, you can upgrade your web hosting for more resources and control.

Preventing Slow Downs with Domains.co.za

When it comes to web hosting for SMEs, shared hosting plans are designed to be cost-effective by placing many different websites on a single server. This means they share the same CPU, RAM, and bandwidth.

This can be a problem if one site gets hit with a flood of aggressive AI crawler bot traffic (known as the noisy neighbour effect), as the high resource usage can slow down the entire server. This is where the CloudLinux server software used across Domains.co.za’s entire hosting infrastructure makes all the difference when it comes to speed and stability.

The CloudLinux system isolates each website on the server into its own Lightweight Virtual Environment (LVE). The LVE technology prevents a single site from consuming all CPU, RAM, and bandwidth available. This means your site’s performance remains stable, regardless of what’s happening with others on the server.

It also implements fair resource allocation to ensure that every website gets its proper allocation of the server’s CPU, RAM, and bandwidth. This prevents your site from slowing down, even if AI bots are hitting another site.

Following that, upgrading to a Virtual Private Server (VPS) or Managed cPanel Web Hosting plan gives your site access to its own resources similar to a dedicated server, but without the cost.

VPS (Virtual Private Server) Hosting

With a VPS Hosting plan (available for Windows or Linux systems) from Domains.co.za, your site gets its own virtual environment. Having a VPS allows you to access settings and configurations that are unavailable on a standard shared hosting plan, meaning not only more resources, but more control over them.

Additionally, it allows you to implement custom rules and install specialised software to protect your site from further unwanted bot activity. You can also fine-tune your settings to optimise performance for your specific needs, such as increasing bandwidth limits or adjusting processing power.

Managed cPanel Hosting

The next step up is Managed cPanel Web Hosting. This hosting type helps mitigate the impact of aggressive AI crawlers by giving you your own virtual machine, similar to our VPS Hosting, with the added benefit of being managed for you.

With Domains.co.za, you get the dedicated resources your site needs for maximum performance and stability with a setup to your specifications and expert management, letting you focus on growing your business.

Strip Banner Test - Data-led marketing, smart targeting & multiple domains can help expand reach & results. [Learn more]

FAQS

What are AI crawlers?

AI crawlers are bots that collect data from websites to train and improve generative AI models. Unlike traditional search engine crawlers that index content to provide search results, AI crawlers often extract the content itself to use in AI products.

How are AI crawlers different from search engine crawlers?

Traditional crawlers, like Googlebot, are designed to find and index content to help users discover your site. AI crawlers, on the other hand, are focused on gathering large amounts of data to train AI models, often with little to no benefit in terms of driving traffic back to your site.

How do I block AI crawlers from my website?

The most common way is to use a robots.txt file in your website’s root directory to add commands instructing crawlers not to access your site. More advanced methods include using a CDN or a firewall to block them.

Will blocking AI bots hurt my SEO rankings?

No, in fact, it could actually help save them by preventing your site from being flooded with bots that interfere with legitimate crawling, indexing, and performance.

How do I know if an AI crawler is slowing down my site?

If your website’s bandwidth usage is increasing but your human traffic isn’t, AI crawlers may be the cause. Monitoring tools can also show high volumes of requests from agents like GPTBot or ClaudeBot.

Other Blogs of Interest

What Our Customers say...