The Great AI Pillage and Bot War

Insight 8 October 2025

The Great AI Pillage and Bot War

Bot traffic is out of control

The internet is under siege and Intellectual Property (IP) is being pillaged at an unprecedented rate. We own and manage hundreds of domains and have seen bot traffic grow 10x on average over the last 12 months. For many of our more informational sites, human traffic is now a minority of total traffic at rates of 10 bots to 1 human. Our cybersecurity portfolio company Belljar handles billions of page requests and sees the same trend across the board. This is not just a nuisance, it is a threat to the very fabric of the web, and we need to take action to make the internet a better place for humans.

Dead Internet Theory: Not a theory anymore

The soft version of the "Dead Internet Theory" is that a large portion of Internet traffic and content is generated by non-human actors, such as bots, spiders, scrapers, and crawlers - where some are AI powered, and some other are just automated scripts. We believe that this part of the theory is now a fact, and has been for a while. When so much of the Internet is consumed by these non-human entities, it is easy to extrapolate that much of the content we encounter online is also generated, manipulated, or amplified by whomever controls these bots. This aligns with the darker side of the theory, which hints that real human content is being drowned out, leading to a barren Internet landscape where all visible content exists to serve a particular agenda, whether that be commercial, political, or otherwise.

Poor Internet Etiquette

The rise of AI bots has also led to a steep decline in Internet etiquette. Before Large Language Models (LLMs) and AI bots, the companies crawling the web would often follow a set of rules and guidelines, such as the robots.txt file, which specifies which parts of a website can be crawled and indexed. With the LLM race on, there are a growing number players that are ignoring these rules, leading to a hostile environment for website owners and operators. In the last 12 months, we experienced countless events where AI bots ignored the established norms of the web, and overwhelmed our servers with unreasonable numbers of parallel requests, often leading to sustained server usage and degraded performance for our real human users.

The Bot War

The malicious AI bots are using every trick in the book to evade detection and continue their pillaging. Some of the tactics we have seen include:

  • IP Rotation: Using a large pool of IP addresses to avoid rate limiting and blocking. We blocked over 5 million IP addresses in the last 12 months.
  • Residential Proxies: Using residential IP addresses to appear as legitimate users. The issue here is that several residential internet providers are renting out IP addresses to AI bot operators, and blocking residential ranges will affect real users in the crossfire.
  • Captcha Solving Services: Using third-party services to solve CAPTCHAs and bypass security measures.
  • Bypassing CDNs and WAFs: Targeting the origin servers directly to avoid CDN-level protections. We are not naming names here, but a Hyperscaler did this to us, allegedly.

As with any war, collateral damage is inevitable. It is now much more common to see real users being prompted with human verification challenges before a page can be loaded, leading to a poor human user experience.

The Great AI Pillage

The AI bots are not just consuming content, they are also stealing it (depending on how you or your country interprets copyright law). When an AI bot scrapes a website, it makes a verbatim copy of the content, stores it in its training dataset, and uses it to train their models. Some LLM models will go a step further, and also pass the verbatim content as part of prompt processing (such as in RAG - Retrieval Augmented Generation), leading to an even grayer area around copyright infringement. As LLM models grow in size, they will be able to "generate" content that matches copyrighted work word for word, leading to a situation where the AI model is effectively redistributing copyrighted content illegally. We are closer to this than expected, as a recent study showed that an open source LLM could reproduce half of a Harry Potter book.

The Future of the Web

The web was built on the principles of openness, collaboration, and respect for intellectual property. The rise of AI bots threatens to undermine these principles, leading to a web that is dominated by non-human actors and devoid of real human insights. To combat this, we need to take action on multiple fronts:

  • Stop the Bots: Website owners need to invest in better detection and mitigation strategies, such as our own Belljar WAF, to protect their content and ensure a good user experience for real humans.
  • Stronger Regulations: Governments need to implement stronger Intellectual Property protections to protect the rights of content creators and ensure a fair and open web.
  • Community Efforts: The web community needs to come together to implement standards, share knowledge, resources, and best practices for dealing with the challenges posed by AI bots.

The Great AI Pillage is a wake-up call for all of us. We need to take action now to protect the web and ensure that it remains a vibrant and diverse space for human creativity and expression.

Written by Rafael Gracioso Martins, Managing Partner at Outroll, Director at Belljar.

Finally, a suitable forever home for your custom built software. Are you ready?

Abstract geometric design with a half-circle and diagonal lines in purple