# There is no search benefit to any AI models scraping sites - all they do is steal content for their own profit, attribution free, which leads to them serving our content without ever sending users to us.
# Reference: https://neil-clarke.com/block-the-bots-that-feed-ai-models-by-scraping-your-website/

## https://commoncrawl.org/faq - Has been used by ChatGPT, Bard, and others for training a number of models.
User-agent: CCBot
Disallow: /

## The bot used when a ChatGPT user instructs it to reference your website.
User-agent: ChatGPT-User
Disallow: /

## The bot that OpenAI uses to collect bulk training data for ChatGPT.
User-agent: GPTBot
Disallow: /

## Block Google from scraping your site for Bard and VertexAI.
User-agent: Google-Extended
Disallow: /

## Omgili sell data they scrape to others for their AI training.
User-agent: Omgilibot
Disallow: /
User-agent: Omgili
Disallow: /

## Meta’s bot that crawls public web pages to improve language models for their speech recognition technology.
User-agent: FacebookBot
Disallow: /

## Apple very kindly told us how to block their scraper AFTER they'd scraped everything.
User-agent: Applebot-Extended
Disallow: /

## is used by used by Anthropic to gather data for their “AI” products, such as Claude
User-agent: anthropic-ai
Disallow: /

## is another agent used by Anthropic that is more specifically related to Claude
User-agent: ClaudeBot
Disallow: /

# is a somewhat dishonest scraping bot used to collect data to train LLMs. This is their default user-agent, but they make it easy for their clients to change it to something else and ignore your wishes
User-agent: Diffbot
Disallow: /

## This is just getting stupid and I hope governments step in.
User-agent: Bytespider
Disallow: /
User-agent: ImagesiftBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: cohere-ai
Disallow: /