Cloudflare Launches Auto Block Bots for Websites

Cloudflare’s auto block bots system addresses the massive imbalance where AI companies extract content without providing value back to publishers, fundamentally reshaping web economics.

Cloudflare automatically blocks AI crawlers by default, requiring explicit permission for access across their vast network.

The Pay Per Crawl system lets website owners allow, block, or charge AI bots using the resurrected Payment Required status code.

Updated on: July 15, 2025

AI companies aggressively crawl websites but rarely send traffic back in return. OpenAI’s crawl-to-referral ratio reached 1,700:1 by June 2025, while Anthropic showed an even more dramatic 73,000:1 ratio. These AI bots extract huge amounts of content and generate almost no return traffic to publishers. Cloudflare has tackled this problem by launching an automatic block bots feature that gives website owners better control over AI crawlers.

Website owners who use Cloudflare can now automatically stop AI companies from collecting their digital data. This protection extends to nearly a quarter of all websites since Cloudflare manages about 20% of global internet traffic. The company has also rolled out Pay Per Crawl, which works smoothly with existing web systems and uses HTTP status codes to create a framework that charges for content access. This represents a radical alteration in how websites handle and monetize bot traffic, especially since Cloudflare’s CDN directly processes about 16% of global internet traffic.

Cloudflare activates default AI bots blocking

Table of Contents

Cloudflare will now, by default, block all AI crawlers.
Source: TechCrunch

Cloudflare has become the first major provider to block AI crawlers automatically from accessing website content without permission or compensation. This default setting represents a fundamental change from the previous opt-out approach to a permission-based model for AI web scraping.

Why AI crawlers are now blocked by default

Customer demand drove this decision, with over 1 million Cloudflare customers already using optional AI-bot-blocking tools. The company discovered a broken relationship between websites and AI crawlers. Search engines used to index content and direct users back to original sites, which generated traffic and revenue. AI crawlers now collect content without sending visitors to the source, which deprives creators of revenue and recognition.

Recent data from June 2025 shows this imbalance clearly. Google crawls websites about 14 times for every referral it sends, but OpenAI’s crawl-to-referral ratio reaches 1,700:1, and Anthropic’s climbs to 73,000:1. These numbers prove that AI training crawlers have broken the once-beneficial relationship between websites and crawlers.

Results are already visible. Bytespider, the former top AI bot, has seen its traffic volume drop by 71.45% since July 2024. GPTBot’s traffic volume has grown by a lot, yet the share of sites it crawls decreased from 35.46% to 28.97%, which suggests more customers block these crawlers actively.

How this changes the web crawling industry

Web interactions have seen a dramatic power change through this policy. New domains signing up with Cloudflare will now choose if they want to allow AI crawlers. This effectively changes the web from an open-by-default system to one where AI companies need permission.

Cloudflare has created the quickest way to identify even shadow scrapers that AI companies don’t publicly disclose. The company uses:

Behavioral analysis
Fingerprinting
Machine learning

These techniques help separate AI bots from legitimate crawlers. This matters because only 37% of the top 10,000 domains currently have a robots.txt file.

The ecosystem now moves from passive signals like robots.txt to enforceable protections through Web Application Firewalls.

If the Internet is going to survive the age of AI, we need to give publishers the control they deserve and build a new economic model that works for everyone.

Source: Matthew Prince, Cloudflare’s CEO

Cloudflare enables monetization through HTTP 402

Cloudflare has brought back HTTP response code 402 – “Payment Required” – instead of blocking AI bots completely. This long-dormant web standard now makes shared monetization possible for website owners who want to control AI crawler access. The solution creates a balance between total bot blocking and unrestricted content access.

What is HTTP 402 and why it matters now

The HTTP 402 status code was designed for payment scenarios but remained unused until recently. Cloudflare has adapted this code to build a permission-based system for AI bots. AI crawlers that request content now receive either successful access with HTTP 200 status or a 402 Payment Required response with pricing details.

How Pay Per Crawl works for publishers

Website owners can fully control their monetization strategy with Pay Per Crawl using three options for each crawler:

Allow: Grant the crawler free access to content
Charge: Require payment at a configured domain-wide price
Block: Deny access entirely with no option to pay

Publishers can select “charge” even if crawlers don’t have a billing relationship with Cloudflare. This effectively blocks access while indicating future monetization possibilities. The system works after existing security measures and integrates with Cloudflare’s rules engine following WAF policies and bot management features.

How AI bots authenticate and pay for access

The system uses Ed25519 cryptographic signatures to prevent crawler spoofing. AI bots must register with Cloudflare and provide their key directory URL and user agent information. Two payment workflows exist: reactive discovery, where crawlers get pricing through crawler-price headers and retry with payment acceptance, or proactive intent, where crawlers include crawler-max-price headers in their first requests. Access continues with transaction confirmation when content pricing stays within specified limits.

Cloudflare manages all billing and payment distribution between AI companies and publishers as the merchant of record.

Also read: New AI Labyrinth Makes Bots Waste Hours In Data Loop

Publishers gain control over AI access and auto block bots

“This change offers those sites an immediate layer of defense against having their original content extracted, repurposed, and embedded in AI models.”

Source: TekRevol Blog

Cloudflare has added detailed tools to its default bot blocking system. These tools help site owners control how AI crawlers access their content. This advancement helps publishers regulate AI training on their materials.

How to configure pricing and permissions

Setting up these controls is straightforward and needs minimal technical knowledge. Site administrators can access the WAF (Web Application Firewall) section of the Cloudflare dashboard. They can set these policies with just a few clicks. The interface lets publishers:

Create rules that block all but one AI bot from specific platforms
Set up negotiated contracts with selected AI partners
Monitor crawler activity via the AI Audit tab
Export detailed reports of most frequently accessed content

Cloudflare suggests updating Terms of Service to address AI training usage. This provides both technical and legal protection.

Role of Cloudflare’s rules engine in enforcement

Cloudflare’s sophisticated rules engine enforces these controls after implementing existing security measures. Bot management features and WAF policies take priority, followed by pay-per-crawl decisions.

Publishers can create exceptions for selective access. These exceptions let specific crawlers bypass charges while restricting others. Site owners can allow certain crawlers free access while charging or blocking others based on their strategic priorities.

The system shows crawlers appropriate HTTP status codes that reflect the publisher’s control and pricing priorities. This creates a standardized framework for bot interactions across the web.

Also read: Global Crackdown Targets Botnet in Major DNS Attacks Disruption

Cloudflare prepares for agentic web future

Cloudflare foresees a web where intelligent software agents will negotiate content access on behalf of users. Their current bot-blocking system serves as groundwork that prepares them for the next wave of AI advancement.

How intelligent agents may negotiate access

Agentic AI consists of specialized agents that work independently to handle specific tasks and interact with data, systems, and people. Cloudflare’s implementation of HTTP 402 creates the foundation for these agents to access digital resources through programmatic negotiation. Users could ask their research programs to combine information about cancer research or legal briefs and give those agents budgets to acquire relevant content.

The company’s Model Context Protocol (MCP) development lets AI systems connect with data sources in a standardized way. This protocol allows agents to authenticate securely with remote servers through integrated OAuth flows.

Potential for dynamic pricing and licensing models

Pay-per-crawl will likely grow substantially beyond its current form. Publishers could set different rates for various content types or paths within their sites. Pricing might vary based on user volume of AI applications and demand.

The licensing world now moves toward more sophisticated models, including detailed licenses for training versus inference at internet scale. Licensing has traditionally served as the main way to allocate rights to collect, use, and share data. These changes help address the growing fragmentation in AI content licensing, where dozens of collective licensing entities now operate.

What this means for AI training and content value

This radical alteration redefines how the AI ecosystem values content. Semrush research shows AI search visitors provide 4.4 times higher value than traditional organic traffic, which creates economic incentives for controlled access. Publishers must make strategic choices between detailed AI optimization and selective blocking.

Cloudflare’s system helps website owners benefit economically from AI while they retain control over their intellectual property.

By verifying crawler intent, a website owner has more granular control, which means they can leave it more open for real humans if they’d like.
Source: Matt Allen from Cloudflare

FAQs

Q1. What is Cloudflare’s new approach to AI bot management?

Cloudflare now automatically blocks AI crawlers by default, requiring explicit permission or compensation for access to websites. This change affects approximately 24% of all sites across the internet that use Cloudflare’s network.

Q2. How does Cloudflare’s Pay Per Crawl system work?

Pay Per Crawl allows website owners to monetize AI crawler access using HTTP 402 Payment Required status. Publishers can choose to allow, block, or charge AI bots for content access, with Cloudflare handling the billing and payment distribution.

Q3. Can website owners still allow some AI crawlers while blocking others?

Yes, Cloudflare’s system provides flexibility. Site administrators can create rules to block all AI bots except those from specific platforms, implement negotiated contracts with selected AI partners, and monitor crawler activity through the AI Audit tab in the dashboard.

Q4. How does Cloudflare prevent unauthorized AI crawlers from accessing content?

Cloudflare uses Ed25519 cryptographic signatures for authentication to prevent crawler spoofing. AI bots must register with Cloudflare, providing their key directory URL and user agent information to gain authorized access.

Q5. What does this change mean for the future of AI and web interactions?

This shift prepares for a future where intelligent software agents may negotiate content access autonomously. It also rebalances the relationship between content creators and AI companies, allowing publishers to maintain control over their intellectual property while potentially benefiting from new revenue streams related to AI training data.

Also read: How to Prepare Effective LLM Training Data

Conclusion

Cloudflare’s groundbreaking approach to AI bot management represents a major move in website interactions with artificial intelligence systems. Website owners have gained unprecedented control over their digital content through default blocking of AI crawlers and the innovative Pay Per Crawl system. The previous power imbalance where AI companies extracted massive value without giving back has started to change dramatically.

These changes create the foundation for an agentic web future where AI systems might autonomously negotiate access to information for users. The Model Context Protocol boosts this vision by enabling secure connections between AI systems and data sources through standardized authentication.

Publishers keep control over their intellectual property while discovering new revenue opportunities. AI development continues with proper attribution and compensation for the content that makes these systems valuable. Cloudflare’s solution represents both a technical fix and a necessary economic development for sustainable coexistence between human creativity and artificial intelligence on the web.

Are you working with proxies? Become a contributor now! Mail us at [email protected]

Helen Bold

Helen Bold has been writing about proxies since 2020. Helen specializes in gathering details, checking facts, and bringing value to our readers. In addition to writing articles, Helen does in-depth research and analyzes proxy industry trends. In her free time, she also writes amazing novels. You can read more about her personal work here: helenbold.com

July 15, 2025