# Category: Chinese Search Engine # Hits: Minimal - robots & random URLs # URL: https://www.so.com/ User-agent: 360Spider Disallow: / # Category: Commercial Web Scrape / Web Crawl Company Crap # Hits: Unknown # URL: http://80legs.com/the-80legs-web-crawler/ User-agent: 80legs Disallow: / User-agent: voltron Disallow: / # Category: Commercial Advertising Crap # Hits: Moderate - robots # URL: https://www.adbeat.com/operation_policy User-agent: adbeat_bot Disallow: / # Category: Commercial Advertising Crap # Hits: Moderate - Random URLs # URL: https://www.admantx.com/ #User-agent: admantx #Disallow: / # Category: Commercial SEO Crap # Hits: Minimal # URL: https://seostar.co/robot/ User-agent: adsbot Disallow: / # Category: Unknown # Hits: Unknown # URL: Unknown User-agent: AdsrvrBot Disallow: / # Category: Commercial Company Aggregating ads.txt # Hits: Minimal - ads # URL: https://www.adstxt.com/ User-agent: adstxt.com Disallow: / # Category: Unknown Sites Scanning ads.txt # Hits: Minimal - ads # URL: https://github.com/InteractiveAdvertisingBureau/adstxtcrawler User-agent: AdsTxtCrawler Disallow: / User-agent: AppNexusAdsTxtCrawler Disallow: / User-agent: gumgumAdsTxtCrawler Disallow: / # Category: Commercial Marketing Link Crap # Hits: Minimal - robots # URL: https://ahrefs.com/ User-agent: AhrefsBot Disallow: / # Category: Distributed JVM App # Hits: Minimal - ads # URL: https://akka.io/ User-agent: akka-http Disallow: / # Category: Commercial SEO Crap # Hits: Minimal - Index # URL: http://alphaseobot.com/bot.html # AKA: AlphaBot User-agent: AlphaSeoBot Disallow: / # Category: Huawei Web Crawler # Hits: High - robots # URL: https://aspiegel.com/ User-agent: AspiegelBot Disallow: / # Category: Commercial Social Media monitoring # Hits: Minimal - Non-working RSS Links # URL: https://awario.com/bots.html User-agent: AwarioRssBot Disallow: / User-agent: AwarioSmartBot Disallow: / # Category: Chinese Search Engine User-agent: Baiduspider Disallow: / # Category: Commercial Data Mining # Hits: Unknown # URL: https://www.exensa.com/ User-agent: Barkrowler Disallow: / User-agent: BUbiNG Disallow: / # Category: Commercial Advertising Crap # Hits: Excessive - robots & ads # URL: https://www.bidswitch.com/ User-agent: bidswitchbot Disallow: / # Category: Commercial Advertising Crap # Hits: Minimal - ads # URL: https://bidtellect.com/ User-agent: Bidtellect Disallow: / # Category: Commercial SEO Backlink Crap # Hits: Moderate - robots & random URLs # URL: http://webmeup-crawler.com/ User-agent: BLEXBot Disallow: / # Category: Commercial Brand Protection Crap # Hits: Moderate - random URLs # URL: https://www.brandverity.com/why-is-brandverity-visiting-me User-Agent: BrandVerity Disallow: / # Category: Commercial Pinterest Wannabe # Hits: Minimal - Random URLs # URL: https://www.bublup.com/bublup-bot User-agent: BublupBot Disallow: / # Category: Lists what technologies it finds sites built with # Hits: Light - robots # URL: https://builtwith.com/ User-agent: BuiltWith Disallow: / # Category: Non-Profit Data Harvesting # Hits: Lots - robots & random URLs # URL: http://commoncrawl.org/big-picture/frequently-asked-questions/ User-agent: CCBot Disallow: / # Category: Commercial Advertising Crap # Hits: Minimal - ads # URL: https://www.centro.net/ User-agent: Centro Ads.txt Crawler Disallow: / # Category: Commercial Brand monitoring # Hits: Minimal - robots & index # URL: https://www.checkmarknetwork.com/ User-agent: CheckMarkNetwork Disallow: / # Category: Commercial Data Mining Crap # Hits: Mild - robots # URL: https://www.clickagy.com/ User-agent: Clickagy Intelligence Bot v2 Disallow: / # Category: Commercial German Browser / Search Engine # Hits: Unknown # URL: https://cliqz.com/en/cliqzbot User-agent: Cliqzbot Disallow: / # Category: Shady Vulnerability Scanner # Hits: Minimal - index # URL: https://commonscan.org/ User-agent: commonscan Disallow: / # Category: SEO Crap # Hits: Excessive - robots & random URLs # URL: https://dataforseo.com/dataforseo-bot User-agent: DataForSeoBot Disallow: / # Category: Commercial Analytics Company # Hits: Unknown # URL: https://www.dataprovider.com/ User-agent: Dataprovider Disallow: / # Category: Korean Search Engine # Hits: Minimal - robots & URLs it shouldn't index # URL: https://www.daum.net/ User-agent: DAUM Disallow: / # Category: Domain Harvester # Hits: Minimal - random URLs # URL: https://github.com/kgretzky/dcrawl User-agent: dcrawl Disallow: / # Category: Commercial SEO Marketing Crap # Hits: Unknown # URL: https://www.deepcrawl.com/bot/ User-agent: deepcrawl Disallow: / # Category: Commercial SEO Harvesting # Hits: Excessive - robots & index # URL: http://www.domaincrawler.com/ User-agent: domaincrawler Disallow: / # Category: Commercial Backlink, Metrics, Rankings, etc... # Hits: Moderate - robots & random URLs (some broken / shouldn't index) # URL: https://domainstats.com/ User-Agent: DomainStatsBot Disallow: / # Category: Expired Domain Bot? # Hits: Minimal - robots & random URLs # URL: https://www.domcop.com/bot User-agent: DomCopBot Disallow: / # Category: Commercial Backlink Crap # Hits: ABUSIVE - pounding robots # URL: https://moz.com/ User-agent: dotbot Disallow: / User-agent: rogerbot Disallow: / # Category: Commercial Marketing Crap # Hits: Minimal - random URLs # URL: https://www.exalead.com User-agent: Exabot Disallow: / # Category: Unknown (Website Down) # Hits: Unknown # URL: https://extlinks.com/Bot.html User-agent: ExtLinksBot Disallow: / # Category: "New" Search Engine for Maximum Privacy # Hits: Unknown # URL: http://femtosearch.com/ User-agent: FemtosearchBot Disallow: / # Category: # Hits: Minimal - robots & index # URL: https://garlik.com/ User-agent: Garlik Disallow: / # Category: Commercial Ad Network # Hits: Moderate - robots, ads & random URLs # URL: https://getintent.com/bot.html User-agent: GetIntent Crawler Disallow: / # Category: Gigablast Search Engine # Hits: Unknown # URL: https://www.gigablast.com/ User-agent: Gigabot Disallow: / User-agent: G-i-g-a-b-o-t Disallow: / # Category: Crawling Project # Hits: Moderate - index & random URLs # URL: http://glutenfreepleasure.com/ User-agent: Gluten Free Crawler Disallow: / # ChatGPT # URL: https://openai.com/gptbot User-agent: GPTBot Disallow: / # Category: Spell Checker - Indexing? # Hits: Moderate - Random URLs # URL: https://www.grammarly.com/ User-agent: Grammarly Disallow: / # Category: Commercial Contextual Intelligence Crap # Hits: Unknown # URL: https://www.grapeshot.com/crawler/ #User-agent: grapeshot #Disallow: / # Category: Commercial Japanese Marketing Firm # Hits: Moderate - index and random - skips robots! # URL: http://hatenaantenna.g.hatena.ne.jp/ User-agent: Hatena Antenna Disallow: / # Category: Commercial Website Audit & Monitoring # Hits: Unknown # URL: https://hexometer.com/ User-agent: Hexometer Disallow: / # Category: Chinese GeoIP Wannabe # Hits: Minimal - index # URL: https://en.ipip.net/ User-agent: HTTP Banner Detection Disallow: / # Category: Random Blogs # Hits: Moderate - Random URLs # URL: https://hubpages.com/ User-agent: HubPages Disallow: / # Category: Commercial Advertising Crap # Hits: ABUSIVE - robots # URL: https://integralads.com/site-indexing-policy/ User-agent: ias_crawler Disallow: / # Category: German Search Engine # Hits: Minimal # URL: https://infotiger.com/bot User-agent: InfoTigerBot Disallow: / # Category: Italian ISP # Hits: Unknown # URL: https://www.tiscali.it/ User-agent: IstellaBot Disallow: / # Category: Java based HTTP client # Hits: Moderate - ads # URL: https://docs.oracle.com/javase/6/docs/api/java/net/HttpURLConnection.html User-agent: Jersey Disallow: / # Category: Chinese Translation Site # Hits: Unknown # URL: https://www.keybot.com/ User-agent: Keybot Disallow: / # Category: Crap # Hits: Unknown # URL: https://line.me/en/ User-agent: Linespider Disallow: / # Category: Translation Bot # Hits: Unknown # URL: https://www.linguee.com/ User-agent: Linguee Disallow: / # Category: Commercial Link Indexer # Hits: Unknown # URL: https://www.linkdex.com/en-us/about/bots/ User-agent: linkdex Disallow: / User-agent: linkdexbot Disallow: / # Category: Unknown - "security research purposes" # Hits: Moderate - robots & random URLs # URL: http://ltx71.com/ User-agent: ltx71 Disallow: / # MaCoCu - Some BS student project # URL: https://www.clarin.si/ User-agent: MaCoCu Disallow: / # Category: Commercial Social Media Monitoring # Hits: Minimal - random URLs # URL: https://www.brandwatch.com/legal/magpie-crawler/ User-agent: magpie-crawler Disallow: / # Category: Russian Mail / Social / Other Crap # Hits: Moderate - robots # URL: http://go.mail.ru/help/robots User-agent: Mail.Ru Disallow: / User-agent: Mail.RU_Bot Disallow: / # Category: Unknown # Hits: Minimal - robots # URL: Unknown User-Agent: MauiBot Disallow: / # Category: Commercial Backlinks Crawler # Hits: Minimal - robots # URL: https://monitorbacklinks.com User-agent: MBCrawler Disallow: / # Category: Commercial Russian SEO Crap # Hits: Minimal - index # URL: https://megaindex.com/ User-agent: MegaIndex.ru Disallow: / User-agent: MegaIndex.com Disallow: / # Category: Commercial SEO Marketing Crap # Hits: ABUSIVE - robots # URL: https://mj12bot.com/ User-agent: MJ12bot Disallow: / # Category: Commercial Analytics Crap # Hits: Minimal - robots & random URLs # URL: https://moat.com/ User-Agent: moatbot Disallow: / # Category: UK Search Engine # Hits: Moderate - robots # URL: https://www.mojeek.com/bot.html # NOTE: See how they behave.... #User-agent: MojeekBot #Disallow: / # Category: SEO Crap # Hits: Moderate - robots & random URLs # URL: https://metrics-tools.de/robot.html User-agent: MTRobot Disallow: / # Category: Commercial Metrics Crap # Hits: Unknown # URL: https://www.netcraft.com/ User-agent: NetcraftSurveyAgent Disallow: / # Category: A HTTP client for Android, Kotlin, and Java # Hits: Unknown # URL: https://square.github.io/okhttp/ User-agent: okhttp Disallow: / # Category: A vertical search engine # Hits: Minimal - Random URLs # URL: http://omgili.com/Crawler.html User-Agent: omgilibot Disallow: / User-Agent: omgili Disallow: / # Category: Commercial Data Mining Crap # Hits: Unknown # URL: https://panscient.com/faq.htm User-agent: panscient Disallow: / # Category: Huawei Search Engine # Hits: High - robots # URL: https://aspiegel.com/ User-agent: PetalBot Disallow: / # Category: Commercial Site # Hits: Moderate - Random URLs & Robots # URL: http://www.pinterest.com/bot.html User-agent: Pinterestbot Disallow: / # Category: Commercial Data Mining Crap # Hits: Unknown # URL: https://pipl.com/bot/ User-agent: PiplBot Disallow: / # Category: Commercial Advertising # Hits: Moderate - robots & ads # URL: https://www.comscore.com/ User-agent: proximic Disallow: / # Category: Commercial Pic Search Indexer # Hits: Unknown # URL: https://www.picsearch.com/bot.html User-agent: psbot Disallow: / # Category: F-Secure Research Crap # Hits: Moderate - Random URLs # URL: http://riddler.io/about User-agent: Riddler Disallow: / # Category: Commercial web scraper for hire. # Hits: Unknown # URL: https://scrapinghub.com/ User-agent: Quick-Crawler Disallow: / User-agent: Scrapy Disallow: / # Category: Commercial SEO Spider Software # Hits: Minimal - Random URLs # URL: https://www.screamingfrog.co.uk/ User-agent: Screaming Frog SEO Spider Disallow: / # Category: Commercial Media Intelligence Crap # Hits: Minimal - robots & random URLs # URL: http://www.carma.com User-agent: ScooperBot Disallow: / # Category: German Search Engine? # Hits: Excessive # URL: http://seekport.com/ User-Agent: Seekport Crawler Disallow: / # Semantic Scholar - Looking for academic PDFs User-agent: SemanticScholarBot Disallow: / # Category: Commercial Marketing Crap # Hits: Minimal - robots # URL: https://www.semrush.com/bot/ User-agent: SemrushBot Disallow: / # Category: Commercial SEO Garbage # URL: https://www.seobility.net/en/bot/ User-agent: Seobility Disallow: / # Category: Commercial Backlink Checker # Hits: Minimal - robots # URL: https://en.seokicks.de/ User-agent: SEOkicks Disallow: / # Category: Commercial SEO Crap # Hits: Unknown # URL: https://serpstat.com/ User-agent: serpstatbot Disallow: / # Category: Czech Portal / Search Engine # Hits: Minimal - robots # URL: https://napoveda.seznam.cz/en/seznamcz-web-search/ User-agent: SeznamBot Disallow: / # Category: Unknown (Website Down) - Backlink Checker # Hits: Unknown # URL: https://siteexplorer.info User-agent: SiteExplorer Disallow: / # Category: Commercial Advertising Marketing Crap # Hits: Minimal - robots & index # URL: http://www.similartech.com/smtbot User-agent: SMTBot Disallow: / # Category: Chinese Search Engine User-agent: Sogou Spider Disallow: / # Category: Commercial SEO Solution Crap # Hits: Unknown # URL: https://www.seoprofiler.com/ User-agent: spbot Disallow: / # Category: Commercial Language Processing # Hits: Moderate - robots # URL: https://nlp.fi.muni.cz/projects/biwec/ User-agent: SpiderLing Disallow: / # Category: Some Commercial Crap # Hits: Moderate # URL: http://sur.ly/bot.html User-agent: SurdotlyBot Disallow: / # Category: Unknown # Hits: Moderate - robots & random pages # URL: Unknown User-Agent: The Knowledge AI Disallow: / # Category: Commercial Social media monitoring & analytics # Hits: ABUSIVE - robots # URL: http://www.trendiction.com/en/publisher/bot User-Agent: trendictionbot Disallow: / # Category: Commercial Advertising Crap # Hits: Moderate - Random URLs # URL: https://www.thetradedesk.com/us/ttd-content User-agent: TTD-Content Disallow: / # Category: Helps edu prevent plagiarism # Hits: Minimal # URL: https://turnitin.com/robot/crawlerinfo.html User-agent: TurnitinBot Disallow: / # Category: Commercial Machine Learning Text Classifier # Hits: Unknown # URL: https://www.uclassify.com/ User-agent: uclassify Disallow: / # Category: Commercial Russian CMS Detector Crap # Hits: Unknown # URL: https://webdatastats.com/policy.html User-agent: WebDataStats Disallow: / # Category: Russian Search Engine User-agent: Yandex Disallow: / User-agent: YandexBot Disallow: / # Category: Backlink Checker # Hits: Minimal - random URLs # URL: http://www.zombiedomain.net/robot/ User-Agent: Zombiebot Disallow: / # Category: Commercial Italian SEO Crap # Hits: Moderate - Random URLs # URL: https://suite.seozoom.it/ User-Agent: ZoomBot Disallow: / User-Agent: Linkbot Disallow: / # Category: Commercial Advertising Crap # Hits: Excessive - robots # URL: https://www.zoominfo.com/ User-Agent: ZoominfoBot Disallow: / User-agent: * Disallow: /wp-login.php Disallow: /xmlrpc.php Crawl-Delay: 10 Sitemap: https://files.extremeoverclocking.com/g_sitemap.xml.gz