<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>Kakkoi &#187; ua</title>
	<atom:link href="http://42.kaizeku.com/taxonomy/ua//feed/" rel="self" type="application/rss+xml" />
	<link>http://42.kaizeku.com</link>
	<description>web development, software, windows tips and trick</description>
	<pubDate>Sat, 12 Jul 2008 15:10:01 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6</generator>
	<language>en</language>
	<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<item>
		<title>User Agent Notes</title>
		<link>http://42.kaizeku.com/web-browsers/user-agent-notes-gary-keith/</link>
		<comments>http://42.kaizeku.com/web-browsers/user-agent-notes-gary-keith/#comments</comments>
		<pubDate>Fri, 02 Nov 2007 20:59:05 +0000</pubDate>
		<dc:creator>Nick B</dc:creator>
		
		<category><![CDATA[Web Browsers]]></category>

		<category><![CDATA[amazon]]></category>

		<category><![CDATA[browser]]></category>

		<category><![CDATA[crawler]]></category>

		<category><![CDATA[Download Manager]]></category>

		<category><![CDATA[dynamic]]></category>

		<category><![CDATA[EMail Harvester]]></category>

		<category><![CDATA[gary+keith]]></category>

		<category><![CDATA[GKJ]]></category>

		<category><![CDATA[netprobe]]></category>

		<category><![CDATA[searchengine]]></category>

		<category><![CDATA[Social Bookmarker]]></category>

		<category><![CDATA[translator]]></category>

		<category><![CDATA[ua]]></category>

		<category><![CDATA[useragent]]></category>

		<category><![CDATA[validator]]></category>

		<category><![CDATA[version control]]></category>

		<category><![CDATA[web crawler]]></category>

		<category><![CDATA[webcheck]]></category>

		<category><![CDATA[wget]]></category>

		<guid isPermaLink="false">http://blog.kakkoi.net/tips/user-agent-notes-gary-keith/</guid>
		<description><![CDATA[

Provided courtesy of http://browsers.garykeith.com
Created October 28, 2007 at 11:35:15 PM GMT
wikipedia: User Agent
Accoona
[Accoona-AI-Agent/* (crawler at accoona dot com)]
Accoona-AI-Agent
UA as of November 2005.
Accoona-AI-Agent/1.1.1 (crawler at accoona dot com)
Ad Brokers
[Ad Brokers]
Ad Brokers
This is for minor ad brokers. The major ad-related SE UAs like Google are in their respective parents.
Amazon.com
[Intelix/*]
Intelix
Crawling from Amazon.com IP Address 72.44.63.10
http://www.microton.cz/intelix/
Become
[MonkeyCrawl/*]
MonkeyCrawl
BitTorrent search engine from [...]]]></description>
			<content:encoded><![CDATA[
<!-- google_ad_section_start -->
<p class="notice">Provided courtesy of <a href="http://browsers.garykeith.com">http://browsers.garykeith.com</a><br />
Created October 28, 2007 at 11:35:15 PM GMT</p>
<p>wikipedia: <a href="http://en.wikipedia.org/wiki/User_agent">User Agent</a></p>
<h2>Accoona</h2>
<p><strong>[Accoona-AI-Agent/* (crawler at accoona dot com)]</strong><br />
Accoona-AI-Agent<br />
UA as of November 2005.<br />
Accoona-AI-Agent/1.1.1 (crawler at accoona dot com)</p>
<h2>Ad Brokers</h2>
<p><strong>[Ad Brokers]</strong><br />
Ad Brokers<br />
This is for minor ad brokers. The major ad-related SE UAs like Google are in their respective parents.</p>
<h2>Amazon.com</h2>
<p><strong>[Intelix/*]</strong><br />
Intelix<br />
Crawling from Amazon.com IP Address 72.44.63.10<br />
http://www.microton.cz/intelix/</p>
<h2>Become</h2>
<p><strong>[MonkeyCrawl/*]</strong><br />
MonkeyCrawl<br />
BitTorrent search engine from Exava.<br />
MonkeyCrawl/0.05 (MonkeyCrawl; http://www.monkeymethods.org; )</p>
<h2>Best of the Web</h2>
<p><strong>[Mozilla/4.0 (compatible; BOTW Spider; *http://botw.org)]</strong><br />
BOTW Spider<br />
Does not request robots.txt like all BOTW bots.</p>
<h2>Blue Coat Systems</h2>
<p><strong>[Blue Coat Systems]</strong><br />
Blue Coat Systems<br />
Content filtering products.<br />
http://www.cerberian.com/</p>
<h2>Copyright/Plagiarism</h2>
<p><strong>[IPiumBot laurion(dot)com]</strong><br />
IPiumBot<br />
I see this as a French Plagerism bot, looking for copyright infringements and such. Did not read robots.txt.<br />
IPiumBot laurion(dot)com</p>
<p><strong>[TurnitinBot/*]</strong><br />
TurnitinBot<br />
Greedy little bastard related to SlySearch and uses the same IP. Does not always respect robots.txt.<br />
TurnitinBot/1.5 (http://www.turnitin.com/robot/crawlerinfo.html)<br />
TurnitinBot/1.5 http://www.turnitin.com/robot/crawlerinfo.html<br />
TurnitinBot/2.0 (http://www.turnitin.com/robot/crawlerinfo.html)<br />
TurnitinBot/2.0 http://www.turnitin.com/robot/crawlerinfo.html</p>
<p><strong>[oBot]</strong><br />
oBot<br />
obot is a spider sent out by a company in Germany called ONLY Solutions. They scan the web looking for sites that infringe on copyrights and logos of their clients.<br />
oBot</p>
<p><strong>[SlySearch/*]</strong><br />
SlySearch<br />
Works in conjunction with Plagiarism.org and TurnItIn.com</p>
<h2>Danger</h2>
<p><strong>[Danger]</strong><br />
Danger<br />
Danger is actually a service currently being used by a number of mobile devices including mostly notablly for T-Mobile&#8217;s Sidekick. It&#8217;s my understanding the AvantGo browser is an unnamed part of the package.<br />
http://www.danger.com/index.php</p>
<h2>Dillo</h2>
<p><strong>[Dillo]</strong><br />
Dillo<br />
Dillo is a small web browser written in C.</p>
<h2>Directories</h2>
<p><strong>[acontbot]</strong><br />
acontbot<br />
German search engine.</p>
<p><strong>[Poirot]</strong><br />
Poirot<br />
URL returns a blank page. No markup at all. IP Address is owned by ThePlanet. No robots.txt. Made second request using LWP::Simple/5.803.<br />
Poirot</p>
<p><strong>[Mackster (*)]</strong><br />
Mackster<br />
Mackster seems to feed a few search engines.</p>
<p><strong>[Misterbot]</strong><br />
Misterbot<br />
French directory/search engine.<br />
Misterbot</p>
<p><strong>[Findexa Crawler (http://www.findexa.no/gulesider/article26548.ece)]</strong><br />
Findexa Crawler<br />
From their website: Findexa is the leading directory publisher in Norway, and the official publisher of telecommunication company Telenor. Findexa publishes four directory brands in Norway - BizKit, Telefonkatalogen, Gule Sider and Ditt Distrikt.<br />
Findexa Crawler (http://www.findexa.no/gulesider/article26548.ece)</p>
<p><strong>[Mozilla/5.0 (Votay bot/*)]</strong><br />
Votay<br />
No robots.txt. Directory with user voting to determine site ranking.<br />
Mozilla/5.0 (Votay bot/4.0; http://www.votay.com/arts/comics/)<br />
Mozilla/5.0 (Votay bot/4.0; http://www.votay.com/business/employment/)<br />
Mozilla/5.0 (Votay bot/4.0; http://www.votay.com/recreation/models/)<br />
Mozilla/5.0 (Votay bot/4.0; http://www.votay.com/shopping/recreation/)</p>
<p><strong>[aipbot/*]</strong><br />
aipbot<br />
Reads robots.txt but it won&#8217;t index any of my sites for their directory.<br />
aipbot/1.0 (aipbot; http://www.aipbot.com; aipbot@aipbot.com)<br />
aipbot/2 (aipbot; http://www.aipbot.com; aipbot@aipbot.com)<br />
aipbot/2-beta (aipbot dev; http://aipbot.com; aipbot@aipbot.com)</p>
<p><strong>[FirstGov.gov Search - POC:firstgov.webmasters@gsa.gov]</strong><br />
FirstGov.gov Search<br />
AT&#038;T/Fast Search robot for FirstGov (U.S.Government) portal<br />
FirstGov.gov Search - POC:firstgov.webmasters@gsa.gov</p>
<p><strong>[Mozilla/5.0 (?http://www.toile.com/) ToileBot/*]</strong><br />
Toile<br />
This appears to be a very popular French directory. Submissions are screened by humans and if accepted a bot comes around occasionally to validate links. They get secondary (backfill) results from Google.</p>
<h2>DNS Tools</h2>
<p><strong>[Domain Dossier utility*]</strong><br />
Domain Dossier<br />
Free DNS tools. The IP Address for this bot was 70.84.211.98 which has a PTR of mail.webpal.info. If you go to www.webpal.info it&#8217;s the login page for someone&#8217;s website control panel.</p>
<p><strong>[DNSGroup/*]</strong><br />
DNS Group Crawler<br />
Website in URL cannot be accessed. E-mail bounced (This address no longer accepts mail).<br />
DNSGroup/0.1 (DNS Group Crawler; http://www.dnsgroup.com/; crawler@dnsgroup.com)</p>
<h2>Download Managers</h2>
<p><strong>[LMQueueBot/*]</strong><br />
LMQueueBot<br />
On my site it did obey robots.txt but since it only read the index.asp page after that I can&#8217;t say conclusively if it obeys robots.txt. Regardless, on my sites all download managers are banned.<br />
LMQueueBot/0.1<br />
LMQueueBot/0.2</p>
<p><strong>[BitTorrent/*]</strong><br />
BitTorrent<br />
P2P Client. Not sure why it&#8217;s browsing my websites.<br />
BitTorrent/3.4.2</p>
<p><strong>[Vegas95/*]</strong><br />
Vegas95<br />
Downloads.asp abuser from Japan<br />
Vegas95/1.03 (WinNT; I)</p>
<p><strong>[Star*Downloader/*]</strong><br />
StarDownloader<br />
From their website: Star Downloader is a download manager that accelerates your downloads by splitting the files into several parts and downloading them simultaneously. Download speeds are increased further by choosing the fastest mirror sites.<br />
StarDownloader/1.44<br />
StarDownloader/1.52</p>
<p><strong>[GetRightPro/*]</strong><br />
GetRightPro<br />
This is a download manager. On my site it&#8217;s being used abusively to repeatedly download my files way too quickly.<br />
GetRightPro/6.0a<br />
GetRightPro/6.0b<br />
GetRightPro/6.0beta7</p>
<p><strong>[AutoMate5]</strong><br />
AutoMate5<br />
Part of an automation package from Network Automation that includes FTP downloads.<br />
AutoMate5</p>
<p><strong>[shareaza*]</strong><br />
shareaza<br />
From Wikipedia: Shareaza is a free Windows–based peer-to-peer client which supports the Gnutella, Gnutella2, EDonkey Network, BitTorrent, FTP and HTTP network protocols.<br />
http://www.shareaza.com/</p>
<p><strong>[Xaldon WebSpider*]</strong><br />
Xaldon WebSpider<br />
This is a product from Germany that is basically a download manager. It did not read robots.txt so it&#8217;s a website stripper.<br />
Xaldon WebSpider 2.7.b6</p>
<p><strong>[Mozilla/4.0 (compatible; Getleft*)]</strong><br />
Getleft<br />
From their website: So here is my little effort, it is supposed to download complete Web sites. You give it an URL, and down it goes on, happily downloading every linked URL in that site.<br />
Mozilla/4.0 (compatible; Getleft 1.1.1)<br />
Mozilla/4.0 (compatible; Getleft 1.1.2)<br />
Mozilla/4.0 (compatible; Getleft 1.1b2)</p>
<p><strong>[Wget*]</strong><br />
Wget<br />
GNU file downloader.<br />
wget<br />
wget libfetch/2.0<br />
Wget/1.10<br />
Wget/1.10-rc1<br />
Wget/1.10.1<br />
Wget/1.10.1 (Red Hat modified)<br />
Wget/1.10.1-beta1<br />
Wget/1.10.2<br />
Wget/1.10.2 (Red Hat modified)<br />
Wget/1.4.5<br />
Wget/1.5.2<br />
Wget/1.5.3<br />
Wget/1.5.3.1<br />
Wget/1.5.3gold<br />
Wget/1.6<br />
Wget/1.7<br />
Wget/1.7.1<br />
Wget/1.8<br />
Wget/1.8.1<br />
Wget/1.8.1 cvs<br />
Wget/1.8.2<br />
Wget/1.8.2 modified<br />
Wget/1.9<br />
Wget/1.9 cvs-dev<br />
Wget/1.9 cvs-stable<br />
Wget/1.9 cvs-stable (Red Hat modified)<br />
Wget/1.9-beta<br />
Wget/1.9-beta-unoff<br />
Wget/1.9.1<br />
Wget/1.9.1 WebWasher 3.3</p>
<p><strong>[Prozilla*]</strong><br />
Prozilla<br />
From their website: ProZilla is a download accelerator for Linux which gives you a 200% to 300% improvement in your file downloading speeds.<br />
Prozilla - Download accelerator for Linux1.3.6</p>
<p><strong>[NetPumper*]</strong><br />
NetPumper<br />
From their website: It&#8217;s time to stop downloading and start pumping! NetPumper is a new Download Manager that makes downloading files from the Internet easier, faster and safer. Does not read robots.txt and downloads data at an incredibly fast rate of speed.<br />
NetPumper Pro/0.1<br />
NetPumper/1.02<br />
NetPumper/1.03</p>
<p><strong>[Kontiki Client*]</strong><br />
Kontiki Client<br />
Plain and simple it&#8217;s a download accelerator. According to PC Magazine, &#8220;Kontiki was by far the best program at accelerating transfers. Kontiki significantly speeded up most of our downloads while intruding little on to our test machine.&#8221;<br />
Kontiki Client 1.0.20517.1<br />
Kontiki Client 2.0.21031.0 (2a60753a-3587-a47b-6465-8afd59ed1808)<br />
Kontiki Client 2.01.21211.2 (2a60753a-3587-a47b-6465-8afd59ed1808)<br />
Kontiki Client 2.01.21211.2 (61fce3ba-8a4e-bf01-49f6-7109a56a08b0)<br />
Kontiki Client 2.10.30418.1</p>
<p><strong>[Go!Zilla*]</strong><br />
GoZilla<br />
This is made to look like a Go!Zilla clone but it&#8217;s really checking for formmail vulnerabilities.<br />
Go!Zilla 3.3 (www.gozilla.com)<br />
Go!Zilla 3.5 (www.gozilla.com)</p>
<p><strong>[BitBeamer/*]</strong><br />
BitBeamer<br />
BitBeamer is a fully featured FTP client and a download manager that integrates into your web browser.<br />
BitBeamer/1.0</p>
<p><strong>[FreshDownload/*]</strong><br />
FreshDownload<br />
From their website: Fresh Download is an easy-to-use and very fast download manager software that turbo charges downloading files from the Internet, such as your favorite mp3 files, software, picture collections, video, etc.<br />
FreshDownload/4.40</p>
<p><strong>[lftp/3.2.1]</strong><br />
lftp<br />
Russian-based FTP program. It seems most folks on WMW don&#8217;t like it. I haven&#8217;t decided yet whether or not to ban it.<br />
lftp/3.2.1</p>
<h2>DYNAMIC</h2>
<p><strong>[DYNAMIC]</strong><br />
DYNAMIC<br />
Does not read robots.txt. I have no idea what this company does. Their website is essentially a blank page.</p>
<h2>E-Mail Harvesters</h2>
<p><strong>[*Larbin*]</strong><br />
Larbin<br />
General purpose crawler. Can be configured for a variety of tasks including e-mail harvesting. The user agent can be customized but always includes Larbin somewhere in it. It does read robots.txt but I don&#8217;t know if it fully respects it as all it did was read my robots.txt file before leaving. Besides everything I have ever read about how this bot is used leads me to believe it should be banned.<br />
larbin (protee@gmail.com)<br />
larbin (samualt9@bigfoot.com)<br />
larbin protee@gmail.com<br />
larbin samualt9@bigfoot.com<br />
larbin sebastien.ailleret@inria.fr<br />
LARBIN-EXPERIMENTAL (efp@gmx.net)<br />
larbin_2.1.1 larbin2.1.1@somewhere.com<br />
larbin_2.2.0 (crawl@compete.com)<br />
larbin_2.2.0 crawl@compete.com<br />
larbin_2.6.2 (kalou@kalou.net)<br />
larbin_2.6.2 (larbin2.6.2@unspecified.mail)<br />
larbin_2.6.2 (ramiro@cs.cornell.edu)<br />
larbin_2.6.2 (vitalbox1@hotmail.com)<br />
larbin_2.6.2 larbin2.6.2@unspecified.mail<br />
larbin_2.6.2 ramiro@cs.cornell.edu<br />
larbin_2.6.3 (admins@uptime.at)<br />
larbin_2.6.3 (alex.victoria@trilogy.com)<br />
larbin_2.6.3 (aol@aol.com)<br />
larbin_2.6.3 (crawler@ip2site.com)<br />
larbin_2.6.3 (gqnmgsp@ruc.edu.cn)<br />
larbin_2.6.3 (larbin-2.6.3@unspecified.mail)<br />
larbin_2.6.3 (larbin2.6.3@ruc.edu.cn)<br />
larbin_2.6.3 (larbin2.6.3@unspecified.mail)<br />
larbin_2.6.3 (larbin2.6.3@verisignlabs.com)<br />
larbin_2.6.3 (larbin2.6.3@versign.com)<br />
larbin_2.6.3 (ltaa_web_crawler@groupes.epfl.ch)<br />
larbin_2.6.3 (n.sugandh@epfl.ch)<br />
larbin_2.6.3 (pimenas@softnet.tuc.gr)<br />
larbin_2.6.3 (sneha@iitk.ac.in)<br />
larbin_2.6.3 (wgao@cs.dal.ca)<br />
larbin_2.6.3 (wgao@genieknows.com)<br />
larbin_2.6.3 admins@uptime.at<br />
larbin_2.6.3 aol@aol.com<br />
larbin_2.6.3 crawler@ip2site.com<br />
larbin_2.6.3 gqnmgsp@ruc.edu.cn<br />
larbin_2.6.3 larbin-2.6.3@unspecified.mail<br />
larbin_2.6.3 larbin2.6.3@ruc.edu.cn<br />
larbin_2.6.3 larbin2.6.3@unspecified.mail<br />
larbin_2.6.3 ltaa_web_crawler@groupes.epfl.ch<br />
larbin_2.6.3 n.sugandh@epfl.ch<br />
larbin_2.6.3 pimenas@softnet.tuc.gr<br />
larbin_2.6.3 sneha@iitk.ac.in<br />
larbin_2.6.3 wgao@cs.dal.ca<br />
larbin_2.6.3 wgao@genieknows.com<br />
larbin_2.6.3_for_(http://cosco.hiit.fi/search/) (Tomi.Silander@hiit.fi)<br />
larbin_2.6.3_for_(http://cosco.hiit.fi/search/) (tsilande@hiit.fi)<br />
larbin_2.6.3_for_(http://cosco.hiit.fi/search/) Tomi.Silander@hiit.fi<br />
larbin_2.6.3_for_(http://cosco.hiit.fi/search/) tsilande@hiit.fi<br />
larbin_extended (larbin@oktie.com)<br />
larbin_extended larbin@oktie.com<br />
larbin_test (nobody@airmail.etn)<br />
larbin_test nobody@airmail.etn<br />
Mozilla/4.0 (compatible; MSIE 6.0; AOL 8.0; SV1; .NET CLR 1.1.4322; Windows NT 5.1) (larbin@unspecified.mail)<br />
Mozilla/4.0 (compatible; MSIE 6.0; AOL 8.0; SV1; .NET CLR 1.1.4322; Windows NT 5.1) larbin@unspecified.mail<br />
Mozilla/4.0 (compatible; MSIE6.0; Windows NT 5.1; Maxthon;) (larbin2.6.3@unspecified.mail)<br />
Mozilla/4.0 (compatible; MSIE6.0; Windows NT 5.1; Maxthon;) larbin2.6.3@unspecified.mail<br />
Mozilla/5.0 (larbin@unspecified.mail)<br />
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Larbin/2.6.3 (larbin@unspecified.mail)<br />
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Larbin/2.6.3 larbin@unspecified.mail<br />
Mozilla4.0 (larbin2.6.3@unspecified.mail)<br />
Mozilla4.0 larbin2.6.3@unspecified.mail</p>
<p><strong>[*www4mail/*]</strong><br />
www4mail<br />
From their website: www4mail is an open source application, that allows you to navigate off-line and search the whole Internet via electronic mail (e-mail) by using any standard Web browser and a MIME (Multipurpose Internet Mail Exchange) aware e-mail program. It did not request robots.txt.<br />
Mozilla/4.0 www4mail/3.0 libwww-FM/2.14 (Unix; I)<br />
Mozilla/4.5 www4mail/3.0 libwww-FM/2.14 (Unix; I)<br />
www4mail/2.4 libwww-FM/2.14 (Unix; I)</p>
<p><strong>[EVE-minibrowser/*]</strong><br />
EVE-minibrowser<br />
EVE Online is a MMOG. According to Project Honeypot EVE-minibrowser is being used extensively as an e-mail harvester. For that reason I&#8217;ve put it with this parent and banned it.<br />
http://www.projecthoneypot.org/bsh_X19tb2RlPWdsb2JhbCZ1YWc9RVZFLW1pbmlicm93c2VyJTJGMy4w<br />
http://en.wikipedia.org/wiki/EVE_Online</p>
<p><strong>[Franklin Locator*]</strong><br />
Franklin Locator<br />
See the Links for a discussion on WebmasterWorld.</p>
<p><strong>[Mozilla/4.0 (compatible; Advanced Email Extractor*)]</strong><br />
Advanced Email Extractor<br />
http://www.mailutilities.com/aee/<br />
Mozilla/4.0 (compatible; Advanced Email Extractor v2.76)<br />
Mozilla/4.0 (compatible; Advanced Email Extractor*)</p>
<p><strong>[*E-Mail Address Extractor*]</strong><br />
E-Mail Address Extractor<br />
This is one of many products from Bejing Express.</p>
<h2>ELinks 0.10</h2>
<p><strong>[ELinks 0.10]</strong><br />
ELinks<br />
From their website: ELinks is an advanced and well-established feature-rich text mode webbrowser. ELinks can render both frames and tables, is highly customizable and can be extended via Lua or Guile scripts.</p>
<h2>ELinks 0.11</h2>
<p><strong>[ELinks 0.11]</strong><br />
ELinks<br />
From their website: ELinks is an advanced and well-established feature-rich text mode webbrowser. ELinks can render both frames and tables, is highly customizable and can be extended via Lua or Guile scripts.</p>
<h2>ELinks 0.12</h2>
<p><strong>[ELinks 0.12]</strong><br />
ELinks<br />
From their website: ELinks is an advanced and well-established feature-rich text mode webbrowser. ELinks can render both frames and tables, is highly customizable and can be extended via Lua or Guile scripts.</p>
<h2>ELinks 0.9</h2>
<p><strong>[ELinks 0.9]</strong><br />
ELinks<br />
From their website: ELinks is an advanced and well-established feature-rich text mode webbrowser. ELinks can render both frames and tables, is highly customizable and can be extended via Lua or Guile scripts.</p>
<h2>Emacs/W3</h2>
<p><strong>[Emacs/W3]</strong><br />
Emacs/W3<br />
Emacs/W3 is a full-featured web browser, written entirely in Emacs-Lisp.</p>
<h2>Entireweb</h2>
<p><strong>[Entireweb]</strong><br />
Entireweb<br />
Reads but does respect robots.txt.</p>
<h2>Envolk</h2>
<p><strong>[Envolk]</strong><br />
Envolk<br />
Even after an upgrade, and stating on their bot page that they read and respect robots.txt, it&#8217;s just not true.</p>
<h2>Exalead</h2>
<p><strong>[Mozilla/5.0 (compatible; Exabot/3.0;*)]</strong><br />
Exabot<br />
2007/1/18: They finally created a proper UA so this one isn&#8217;t banned like the others are.</p>
<p><strong>[Exalead NG/*]</strong><br />
Exalead NG<br />
This is Exalead&#8217;s image preview bot.<br />
Exalead NG/MimeLive Client (convert/http/0.123)<br />
Exalead NG/MimeLive Client (convert/http/0.129)<br />
Exalead NG/MimeLive Client (convert/http/0.141)<br />
Exalead NG/MimeLive Client (convert/http/0.143)<br />
Exalead NG/MimeLive Client (convert/http/0.146)</p>
<p><strong>[Exalead]</strong><br />
Exalead<br />
French search engine. Does not read or respect robots.txt.<br />
http://www.exalead.com/search</p>
<p><strong>[NG-Search/*]</strong><br />
NG-SearchBot<br />
German search engine. Well behaved software, respected robots.txt.<br />
NG-Search/0.90 (NG-SearchBot; http://www.ng-search.com; )</p>
<p><strong>[ng/*]</strong><br />
Exalead Previewer<br />
This is Exalead&#8217;s image preview bot.<br />
NG/1.0</p>
<h2>Feeds Blogs</h2>
<p><strong>[Net::Trackback/*]</strong><br />
Net::Trackback<br />
From their website: This package is an object-oriented interface for developing Trackback clients and servers.</p>
<h2>Feeds Syndicators</h2>
<p><strong>[RSS-SPIDER (*)]</strong><br />
Feeds Syndicators<br />
Looks at default root page for RSS tag(s).</p>
<p><strong>[Mozilla/5.0 (*aggregator:TailRank; http://tailrank.com/robot)*]</strong><br />
TailRank<br />
From their website: Tailrank is a service that monitors blogs trying to find interesting memes and hot stories. We have a &#8216;robot&#8217; which analyzes blogs periodically trying to find interesting stories. If we find that a story on your site is &#8216;hot&#8217; we promoted it to our front page. This is a good thing and can drive a lot traffic to your website.<br />
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1; aggregator:Tailrank; http://tailrank.com/robot) Gecko/20021130</p>
<p><strong>[SimplePie/*]</strong><br />
SimplePie<br />
SimplePie is a very fast and easy-to-use class, written in PHP, for reading RSS and Atom syndication feeds.<br />
SimplePie/1.0 Beta (Feed Parser; http://www.simplepie.org/; Allow like Gecko) Build/20060129</p>
<p><strong>[MagpieRSS/* (*)]</strong><br />
MagpieRSS<br />
Older version of what has become SimplePie.</p>
<p><strong>[Feedreader * (Powered by Newsbrain)]</strong><br />
Newsbrain<br />
I haven&#8217;t been able to find out any information about Newsbrain. Sure wish they&#8217;d include a URL!<br />
Feedreader 3.01 (Powered by Newsbrain)</p>
<p><strong>[RssBandit/*]</strong><br />
RssBandit<br />
Very abusive bot. I have them banned via my httpd.ini.</p>
<p>RssBandit/1.3.0.42 (.NET CLR 1.1.4322.2032; WinNT 5.1.2600.0; http://www.rssbandit.org)<br />
RssBandit/1.3.0.45 (.NET CLR 1.1.4322.2032; WinNT 5.1.2600.0; http://www.rssbandit.org)<br />
RssBandit/1.3.0.45 (.NET CLR 1.1.4322.2032; WinNT 5.1.2600.0; http://www.rssbandit.org) (.NET CLR 1.1.4322.2032; WinNT 5.1.2600.0; http://www.rssbandit.org)</p>
<p><strong>[Akregator/*]</strong><br />
Akregator<br />
From their website: Akregator is a news feed reader for the KDE desktop.<br />
http://akregator.kde.org/</p>
<p><strong>[Feed43 Proxy/* (*)]</strong><br />
Feed For Free<br />
Does not read robots.txt.</p>
<p><strong>[FeedBurner/*]</strong><br />
FeedBurner<br />
Reads but does not respect robots.txt.<br />
http://www.FeedBurner.com</p>
<p><strong>[Particls]</strong><br />
Particls<br />
From their website: Particls helps you track your favourite sites, topics and apps by displaying desktop alerts for important changes.<br />
http://www.particls.com</p>
<p><strong>[Mozilla/5.0 (RSS Reader Panel)]</strong><br />
RSS Reader Panel<br />
RSS feed reader extension for Mozilla Firefox.<br />
Mozilla/5.0 (RSS Reader Panel)</p>
<p><strong>[intraVnews/*]</strong><br />
intraVnews<br />
From their website: Feed Reader and RSS Aggregator for Outlook.<br />
intraVnews/1.1 (http://www.intravnews.com/)<br />
intraVnews/1.12 (http://www.intravnews.com/)</p>
<p><strong>[Mobitype * (compatible; Mozilla/*; MSIE *.*; Windows *)]</strong><br />
Mobitype<br />
Appears to be a France-based mobile RSS Feed Aggregator that focuses on blogs. The site is mostly in French.<br />
http://www.mobitype.com/</p>
<p><strong>[Cocoal.icio.us/* (*)*]</strong><br />
Cocoal.icio.us<br />
No robots.txt. This appears to be some sort of RSS/podcast search engine. I have no idea why it&#8217;s crawling my websites.<br />
Cocoal.icio.us/1.0 (v38) (Mac OS X; http://www.scifihifi.com/cocoalicious)</p>
<p><strong>[Strategic Board Bot (?http://www.strategicboard.com)]</strong><br />
Strategic Board Bot<br />
Did not read robots.txt. From their website: Strategic Board is a Web 2.0 search engine that aggregates IT related RSS feeds. We automatically monitor and identify new IT related blogs.<br />
Strategic Board Bot ( http://www.strategicboard.com)</p>
<p><strong>[*NetVisualize*]</strong><br />
NetVisualize<br />
From their website: NetVisualize Favorites Organizer lets you manage your bookmarks and favorites the way you remember them - visually! NetVisualize creates thumbnail images of your favorite websites, and is as simple to use and familiar as Windows Explorer.<br />
Mozilla/4.0 (compatible; MSIE 5.0; NetVisualize b202)<br />
Mozilla/4.0 (compatible; MSIE 5.0; NetVisualize b203)</p>
<p><strong>[Omnipelagos*]</strong><br />
Omnipelagos<br />
From their website: Omnipelagos finds the shortest paths between any two things. I don&#8217;t know what that means.</p>
<p><strong>[JetBrains Omea Reader*]</strong><br />
Omea Reader<br />
From their website: Omea Reader is an easy to use, all-in-one RSS/ATOM feed reader, newsgroup reader, and web bookmark manager. From my point of view: That may be true but it&#8217;s being used to check the wrong page for browscap.ini updates so I need to ban it.<br />
JetBrains Omea Reader 1.0.2 (http://www.jetbrains.com/omea_reader/)<br />
JetBrains Omea Reader 1.0.4 (http://www.jetbrains.com/omea_reader/)<br />
JetBrains Omea Reader 2.0 (http://www.jetbrains.com/omea/reader/)<br />
JetBrains Omea Reader 2.0 Release Candidate 1 (http://www.jetbrains.com/omea_reader/)<br />
JetBrains Omea Reader 2.0 Release Candidate 6 (http://www.jetbrains.com/omea/reader/)<br />
JetBrains Omea Reader 2.1.2 (http://www.jetbrains.com/omea/reader/)<br />
JetBrains Omea Reader 2.1.6 (http://www.jetbrains.com/omea/reader/)</p>
<h2>Flatland Industries</h2>
<p><strong>[Flatland Industries]</strong><br />
Flatland Industries<br />
Log spammer.</p>
<h2>FrontPage</h2>
<p><strong>[FrontPage]</strong><br />
FrontPage<br />
From their website: iSiloX is the desktop application that converts content to the iSilo 3.x/4.x document format, enabling you to carry that content on your Palm OS PDA, Pocket PC PDA, Windows CE Handheld PC, or Windows computer for viewing using iSilo.</p>
<h2>General Crawlers</h2>
<p><strong>[DomainsDB.net MetaCrawler*]</strong><br />
DomainsDB<br />
Reverse IP and NS lookup tool.<br />
DomainsDB.net MetaCrawler v.0.9.7b (http://domainsdb.net/)<br />
DomainsDB.net MetaCrawler v.0.9.7c (http://domainsdb.net/)</p>
<p><strong>[iVia Page Fetcher*]</strong><br />
iVia Software<br />
Claims to respect robots.txt but it never even read it.<br />
iVia Page Fetcher (http://ivia.ucr.edu/useragents.shtml)</p>
<p><strong>[Mozilla/5.0 (compatible; AboutUsBot/*)]</strong><br />
AboutUsBot<br />
Did not read robots.txt. From their website: Gathers descriptive information about a website from several sources to build a Wiki Page.</p>
<p><strong>[WeBoX/*]</strong><br />
WeBoX<br />
From their website: Web Collector &#038; Text Collector &#038; Web Database &#038; Tab Browser &#038; Tab Editor &#038; RSS Reader.</p>
<p>Basically a general crawler. It&#8217;s written in Japanese.</p>
<p><strong>[BabalooSpider/1.*]</strong><br />
BabalooSpider<br />
Comes from same IP Address as Exploder/0.1. As of 25 February 2007 the website is just a placemarker.<br />
http://www.babaloo.si</p>
<p><strong>[KBeeBot/0.*]</strong><br />
KBeeBot<br />
No robots.txt.</p>
<p><strong>[Nozilla/P.N (Just for IDS woring)]</strong><br />
Nozilla/P.N<br />
Looking for World of Warcraft vulnerabilities.</p>
<p><strong>[ScollSpider/2.*]</strong><br />
ScollSpider<br />
Despite the claims on their website this bot does not read robots.txt.<br />
http://www.webwobot.com/ScollSpider.php</p>
<p><strong>[Lorkyll *.* -- lorkyll@444.net]</strong><br />
Lorkyll<br />
I&#8217;ve had traffic from this bot&#8217;s netrange and banned individual addresses. But as of February 2007 the number of bad bots coming from the C class is enough for me to ban the C class via firewall.</p>
<p><strong>[West Wind Internet Protocols*]</strong><br />
Versatel<br />
No robots.txt.<br />
http://www.versanet.de</p>
<p><strong>[Marvin v0.3]</strong><br />
MedHunt<br />
Marvin (Multi-Agent Retrieval Vagabond on Information Networks) is a medical information spider linked to MedHunt.<br />
Marvin v0.3</p>
<p><strong>[*autokrawl*]</strong><br />
autokrawl<br />
Read robots.txt way too late in the crawl. I also saw Voyager/1.0 crawling from the same IP Address.</p>
<p><strong>[Comodo HTTP(S) Crawler*]</strong><br />
Comodo HTTP Crawler<br />
They appear to be SSL providers so why are they crawling my website? Both URLs in the user agent redirect to a home page. It did read and appear to obey robots.txt so I&#8217;ll try blocking it manually for now.<br />
Comodo HTTP(S) Crawler - http://www.instantssl.com/crawler<br />
Comodo HTTP(S) Crawler - http://www.instantssl.com/crawler, http://www.whichssl.com/crawler</p>
<p><strong>[Cynthia 1.0]</strong><br />
Cynthia<br />
From their website: Cynthia is a web content accessibility validation solution, it is designed to identify errors in design related to Section 508 standards and the WCAG guidelines.<br />
Cynthia 1.0</p>
<p><strong>[Diff-Engine*]</strong><br />
General Crawlers<br />
It reads robots.txt and appears to obey it, although I have no idea what it is or what it does.<br />
Diff-Engine (Liang.Lu@cern.ch)<br />
Diff-Engine Liang.Lu@cern.ch</p>
<p><strong>[FRSEEKBOT]</strong><br />
FRSEEKBOT<br />
This is a French search engine.</p>
<p><strong>[HTTP-Test-Program]</strong><br />
WebBug<br />
WebBug lets you enter a URL, then displays exactly what it sends to the Web Server and, when the response is received, exactly what the Web Server sends back.<br />
HTTP-Test-Program</p>
<p><strong>[http://www.almaden.ibm.com/cs/crawler*]</strong><br />
IBM&#8217;s WebFountain<br />
The information collected from the web is currently being used in IBM&#8217;s Research Division for several search/indexing projects.<br />
http://www.almaden.ibm.com/cs/crawler<br />
http://www.almaden.ibm.com/cs/crawler <strong>[rc1.wf.ibm.com]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[rc2.wf.ibm.com]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[bc13]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[bc14]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[bc15]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[bc16]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[bc18]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[bc2]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[bc26]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[bc34]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[bc35]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[bc7]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[bc8]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[bc9]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[bv2m304]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[c01]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[c11]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[c12]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[fc15]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[fc2]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[fc3]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[fc4]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[fc7]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[fc8]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[fc9]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[hc2]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[hc3]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[hc5]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[wf223]</strong><br />
http://www.almaden.ibm.com/cs/crawler <strong>[wf84]</strong><br />
http://www.almaden.ibm.com/cs/crawler/<br />
http://www.almaden.ibm.com/cs/crawler/ <strong>[v08l:odrayab:9.1.64.25]</strong></p>
<p><strong>[Mozilla/4.0 (compatible; MSIE 4.01; Vonna.com b o t)]</strong><br />
Vonna.com<br />
Did not read robots.txt.</p>
<p><strong>[ccubee/*]</strong><br />
ccubee<br />
From their website: <strong>[The product]</strong> integrates functionality of all significant areas of information processing that are focused on the content and the structure analysis of publicly available or internal resources.<br />
ccubee/3.0<br />
ccubee/3.1<br />
ccubee/3.2<br />
ccubee/3.5</p>
<p><strong>[XML Sitemaps Generator*]</strong><br />
XML Sitemaps Generator<br />
Apparently this is the user agent you&#8217;ll see when creating an XML sitemap via Google, and perhaps others.</p>
<p><strong>[Mozilla/5.0 (compatible; Kyluka crawl; http://www.kyluka.com/crawl.html; crawl@kyluka.com)]</strong><br />
Kyluka<br />
Questionable practices but for now they&#8217;re not banned, just requested to go away in robots.txt</p>
<p><strong>[QuickFinder Crawler]</strong><br />
QuickFinder<br />
No robots.txt. The IP Address I saw belonged to Novell, Inc.<br />
QuickFinder Crawler</p>
<p><strong>[Links4US-Crawler,*]</strong><br />
Links4US-Crawler<br />
Claims to use data from DMOZ.org so why are they crawling themselves. Especially without reading robots.txt!<br />
Links4US-Crawler, ( http://links4us.com/)</p>
<p><strong>[SynapticSearch/AI Crawler 1.?]</strong><br />
SynapticSearch<br />
A search quickly turned up synapticsearch.com which redirects to alpha-leonis.lids.mit.edu/ss/. It&#8217;s a distributed crawler that is really not ready for prime-time. Most of the links on their website are broken. There is no contact information.<br />
I&#8217;m not sure what to do with this one yet. For now I&#8217;m adding it to browscap as a general crawler and flagging it as isBanned.<br />
If someone from this project at MIT can contact me I&#8217;d sure appreciate it. I don&#8217;t want to ban this bot, but unless it learns how to behave, like reading and respecting robots.txt, that&#8217;ll be my only option.</p>
<p><strong>[WebFilter Robot*]</strong><br />
WebFilter Robot<br />
From their website: WebFilter is an intelligent agent that filters the new pages announced on the NCSA What&#8217;s New Page, looking for Web resources that match your ongoing interests.<br />
WebFilter Robot 1.0</p>
<p><strong>[WebTrends/*]</strong><br />
WebTrends<br />
Stuff related to WebTrends reports.<br />
WebTrends/3.0 (WinNT)</p>
<p><strong>[Willow Internet Crawler by Twotrees V*]</strong><br />
Willow Internet Crawler<br />
Willow Internet Crawler by Twotrees. Content Filtering.<br />
Willow Internet Crawler by Twotrees V2.1</p>
<p><strong>[BravoBrian BStop*]</strong><br />
BravoBrian BStop<br />
Dutch model car website directory<br />
BravoBrian BStop<br />
BravoBrian bstop.bravobrian.it</p>
<p><strong>[CJNetworkQuality; http://www.cj.com/networkquality]</strong><br />
CJNetworkQuality<br />
It appears to read and respect robots.txt. But it&#8217;s still a controversial crawler so some folks may want to move it to Website Strippers. See the discussion on Webmaster World.<br />
CJNetworkQuality; http://www.cj.com/networkquality</p>
<p><strong>[n4p_bot*]</strong><br />
n4p_bot<br />
From their website: This is a peer-to-peer protocol for distributing files. It makes use of the upstream bandwidth of every downloader to increase the effectiveness of the distribution as a whole, and to gain advantage on the part of the downloader. The term they use to describe this is &#8220;torrents&#8221; as in BitTorrent software.<br />
n4p_bot (crawler@n4p.com)<br />
n4p_bot crawler@n4p.com</p>
<p><strong>[semanticdiscovery/*]</strong><br />
Semantic Discovery<br />
From their website: The Semantic Discovery robot collects content from the web to be matched into focused &#8220;product and service&#8221; taxonomies and then published in multiple search engine directories.<br />
semanticdiscovery/0.2(http://www.semanticdiscovery.com/sd/robot.html)<br />
semanticdiscovery/0.3(http://www.semanticdiscovery.com/sd/robot.html<br />
semanticdiscovery/0.4(http://www.semanticdiscovery.com/sd/robot.html</p>
<p><strong>[UbiCrawler/*]</strong><br />
UbiCrawler<br />
Yet another university project of some kind.</p>
<p><strong>[UCmore]</strong><br />
UCmore<br />
This is a toolbar for IE.<br />
UCmore</p>
<p><strong>[niXXieBot?Foster*]</strong><br />
niXXiebot-Foster<br />
Claims to be the first contextual advertising company in the UK. Their bot was very abusive. For starters it read robots.txt before every single one of the 36,000 pages it crawled. Also, while it was crawling with this user agent it was also crawling from the same IP Address using niXXieBot-Foster as the user agent.</p>
<p><strong>[SMBot/*]</strong><br />
SMBot<br />
Appears to be a tool offered by Amazon.com.</p>
<p><strong>[searchbot admin@google.com]</strong><br />
searchbot<br />
Trying to spoof Google and doing a bad job of it!<br />
searchbot admin@google.com</p>
<p><strong>[PhpDig/*]</strong><br />
PhpDig<br />
This is the default user agent for PhpDig. Usually a client will modify this user agent to reflect their own search engine. From the PhpDig website: PhpDig is a PHP and MySQL web spider and search engine, released under the GNU General Public License.<br />
PhpDig/PHPDIG_VERSION ( http://www.phpdig.net/robot.php)</p>
<p><strong>[eventax/*]</strong><br />
eventax<br />
Searches for online events, mostly in Germany.<br />
eventax/1.3 (eventax; http://www.eventax.de/; info@eventax.de)</p>
<p><strong>[Tecomi Bot (http://www.tecomi.com/bot.htm)]</strong><br />
Tecomi<br />
Bot page does not exist. Site is under development.<br />
Tecomi Bot (http://www.tecomi.com/bot.htm)</p>
<p><strong>[dragonfly(ebingbong#playstarmusic.com)]</strong><br />
eBingBong<br />
Did not request robots.txt.<br />
http://www.ebingbong.com/</p>
<p><strong>[htdig/*]</strong><br />
ht://Dig<br />
From their website: a complete indexing and searching system for a domain or intranet.<br />
htdig/3.1.2 (webmaster@neurovia.umn.edu)<br />
htdig/3.1.6 (romieu@bastide-medical.fr)<br />
htdig/3.1.6 (unconfigured@htdig.searchengine.maintainer)<br />
htdig/3.1.6 (webmaster@choiceoneonline.com)</p>
<p><strong>[ArachnetAgent*]</strong><br />
General Crawlers<br />
This appears to be related to the TuringOS crawler.<br />
ArachnetAgent 2.3</p>
<p><strong>[grub crawler]</strong><br />
grub crawler<br />
From their website: Leveraging the power of distributed computing, Grub allows everyone with an Internet connection to participate in the last frontier of discovery. By downloading the unique screensaver, you can donate your computer&#8217;s unused bandwidth to probing the hidden depths of the Web.<br />
grub crawler</p>
<p><strong>[Mozilla/4.0 (compatible; N-Stealth)]</strong><br />
N-Stealth<br />
From their website: N-Stealth is a vulnerability-assessment product that scans web servers to identify security problems and weaknesses that may allow an attacker to gain privileged access.<br />
Mozilla/4.0 (compatible; N-Stealth)</p>
<p><strong>[Lincoln State Web Browser]</strong><br />
Lincoln State Web Browser<br />
Does not read robots.txt.<br />
Lincoln State Web Browser</p>
<p><strong>[Seeker.lookseek.com]</strong><br />
LookSeek<br />
Does not read robots.txt.<br />
Seeker.lookseek.com</p>
<p><strong>[DTAAgent]</strong><br />
DTAAgent<br />
User agent contained no details about what website it&#8217;s from or what it&#8217;s doing. It read robots.txt. An rDNS lookup returned an error. All I know is the IP Address is RIPE and appears to belong to a German ISP/Host.<br />
DTAAgent</p>
<p><strong>[nicebot]</strong><br />
nicebot<br />
This bot has a mixed reputation. In some cases it respects robots.txt. In other cases it doesn&#8217;t bother reading robots.txt.<br />
nicebot</p>
<p><strong>[ShopWiki/1.0*]</strong><br />
ShopWiki<br />
Crawler for ShopWiki website. It appears to read and respect robots.txt.<br />
ShopWiki/1.0 ( http://www.shopwiki.com/)<br />
ShopWiki/1.0 ( http://www.shopwiki.com/wiki/Help:Bot)</p>
<p><strong>[Mozilla/5.0 (compatible; Vermut*)]</strong><br />
Vermut<br />
From their website: Vermut is a web crawler which collects web content for general analysis and building of search indexes. It appears to be part of AOL, but I can&#8217;t find absolute proof of that.</p>
<p><strong>[HTTP/1.0]</strong><br />
HTTP/1.0<br />
Did not request robots.txt. IP Address resolves to opticaljungle.com. There is no website at that URL.<br />
HTTP/1.0</p>
<p><strong>[OpenTaggerBot (http://www.opentagger.com/opentaggerbot.htm)]</strong><br />
OpenTaggerBot<br />
Social bookmarking site.<br />
OpenTaggerBot (http://www.opentagger.com/opentaggerbot.htm)</p>
<p><strong>[Tagyu Agent/1.0]</strong><br />
Tagyu<br />
Converts text or a URL to tags.<br />
Tagyu Agent/1.0</p>
<p><strong>[Visicom Toolbar]</strong><br />
Visicom Toolbar<br />
An IE toolbar made with Visicom Media Dynamic Toolbar software.<br />
Visicom Toolbar</p>
<p><strong>[RixBot (http://babelserver.org/rix)]</strong><br />
RixBot<br />
Some sort of search engine for REBOL-related scripts and news.<br />
RixBot (http://babelserver.org/rix)</p>
<p><strong>[Mozilla/4.1]</strong><br />
General Crawlers<br />
No robots.txt.<br />
Mozilla/4.1</p>
<p><strong>[Mozilla Compatible (MS IE 3.01 WinNT)]</strong><br />
General Crawlers<br />
The user agent is just too old and odd to be a real browser. That, combined with the fact it ripped valuable content from one of my websites without even reading robots.txt makes me mad. That&#8217;s why it&#8217;s banned.<br />
Mozilla Compatible (MS IE 3.01 WinNT)</p>
<p><strong>[SurveyBot/*]</strong><br />
SurveyBot<br />
Domain availability checker. It&#8217;s dubious why they need to probe my sites each week when other whois services don&#8217;t need to. Plus I get no traffic from them at all. So they&#8217;re banned.<br />
SurveyBot/2.2 <a href='http://www.whois.sc'>Whois Source</a><br />
SurveyBot/2.3 (Whois Source)</p>
<p><strong>[Search Fst]</strong><br />
Search Fst<br />
Seems to follow behind human-powered user agents indexing pages the person has just visited. It never reads robots.txt. The company behind it is an engineering firm called Fay, Spofford &#038; Thorndike, Inc.<br />
Search Fst</p>
<p><strong>[sohu*]</strong><br />
sohu-search<br />
Some sort of Chinese crawler. No robots.txt.<br />
sohu agent<br />
sohu-search</p>
<p><strong>[mozilla/5.0 (compatible; genevabot http://www.healthdash.com)]</strong><br />
Healthdash<br />
From their website: Healthdash is the fastest and easiest way to find, understand and manage information about consumer health.<br />
mozilla/5.0 (compatible; genevabot http://www.healthdash.com)</p>
<p><strong>[botlist]</strong><br />
botlist<br />
This bot did not read robots.txt. The information on file for the IP Address appears to be spoofed.</p>
<p><strong>[shelob v1.*]</strong><br />
shelob<br />
No robots.txt.</p>
<p><strong>[Gaisbot*]</strong><br />
Gaisbot<br />
From their website: Gaisbot is the agent software of GAIS which crawls web sites all over the world, in order to build a search engine like google or altavista.</p>
<p><strong>[BruinBot*]</strong><br />
BruinBot<br />
From their website: In the WebArchive project, we are interested in building a Web search engine prototype, where the users can ask for different versions of pages collected during different periods of time.<br />
BruinBot ( http://webarchive.cs.ucla.edu/bruinbot.html)</p>
<p><strong>[CacheabilityEngine/*]</strong><br />
CacheabilityEngine<br />
From their website: To help you understand how Web Caches will treat a Web page, the Cacheability Engine will look at a URL (and optionally any images or objects associated with it), giving both specific cache-related data about it, and a general commentary on how cacheable the object is.<br />
CacheabilityEngine/1.30 <http://www.mnot.net/cacheability/></p>
<p><strong>[InternetLinkAgent/*]</strong><br />
InternetLinkAgent<br />
It appears to be a piece of free Japanese software that searches multiple search engines and sorts them for you.<br />
InternetLinkAgent/3.1</p>
<p><strong>[Nudelsalat/*]</strong><br />
Nudelsalat<br />
Noodle salad? It didn&#8217;t read robots.txt so it&#8217;s banned.<br />
Nudelsalat/5.3 (Windoofs eNTe)</p>
<p><strong>[WhizBang]</strong><br />
WhizBang<br />
Corporate Information Crawler</p>
<p><strong>[TheInformant*]</strong><br />
TheInformant<br />
Similar to WebTrends.</p>
<p><strong>[Patwebbot (http://www.herz-power.de/technik.html)]</strong><br />
Patwebbot<br />
Some type of crawler from Germany. As best I could tell from the site it&#8217;s just someone who wrote a bot to crawl the web with no real purpose in mind.<br />
Patwebbot (http://www.herz-power.de/technik.html)</p>
<p><strong>[JetBrains*]</strong><br />
Omea Pro<br />
From their website: Omea Pro is a powerful universal client for aggregating and organizing all kinds of information: emails, files, web links, RSS feeds, newsgroups, tasks, contacts, and even custom resource types that you define.<br />
JetBrains Omea Pro 1.0.3 (http://www.jetbrains.com/omea/)<br />
JetBrains Omea Pro 2.0 Release Candidate 5 (http://www.jetbrains.com/omea/)</p>
<p><strong>[nabot*]</strong><br />
Nabot<br />
Run by Korea Telecom.</p>
<p><strong>[moget/*]</strong><br />
Goo<br />
It is part of the &#8216;InfoBee&#8217; project. It is very related to the regular Inktomi db but is branded as an alternative db. It grabs too many pages in a short period of time which is why it&#8217;s in this category.<br />
moget/2.1 (moget@goo.ne.jp)</p>
<p><strong>[Ocelli/*]</strong><br />
Ocelli<br />
From their website: Ocelli is a Web crawler owned and operated by GlobalSpec®, the leading specialized search engine and information resource for the engineering community. Ocelli&#8217;s mission is to find and index web pages for The Engineering Web from GlobalSpec, a unique slice of the World Wide Web focusing solely on engineering and technical content.</p>
<p>Based on discussions I&#8217;ve seen on WebmasterWorld this spider is not very good at finding the niche content it claims to be searching for. I have banned it from my sites which have nothing to do with engineering, unless you want to count plastic model cars as being engineering related!<br />
Ocelli/1.2 (http://www.globalspec.com/Ocelli)<br />
Ocelli/1.3 (http://www.globalspec.com/Ocelli)</p>
<p><strong>[MapoftheInternet.com?(?http://MapoftheInternet.com)]</strong><br />
MapoftheInternet<br />
Does not read robots.txt.<br />
MapoftheInternet.com ( http://MapoftheInternet.com)</p>
<p><strong>[Webclipping.com]</strong><br />
Webclipping.com<br />
From their website: WebClipping provides clients with news, information, and rumors from every key online source that impacts their business. With critical information collected and delivered to them, decision-makers can spot threats and opportunities in time to act effectively while saving hours of manual research.<br />
Webclipping.com</p>
<p><strong>[Mozdex/0.7.2*]</strong><br />
Mozdex<br />
URL in user agent is 404. From their website: mozDex is a search engine seeded from the dmoz.org directory. mozDex uses open source search technologies to create an open and fair index.<br />
Mozdex/0.7.2 (Mozdex; http://www.mozdex.com/bot.html; spider@mozdex.com)<br />
Mozdex/0.7.2-dev (Mozdex; http://www.mozdex.com/bot.html; spider@mozdex.com)</p>
<p><strong>[NetCarta_WebMapper/*]</strong><br />
NetCarta_WebMapper<br />
Does not read robots.txt. Takes pages too quickly.</p>
<p><strong>[Clushbot/*]</strong><br />
Clushbot<br />
It still does not request robots.txt.<br />
Clushbot/3.1-BinaryFury ( http://www.clush.com/bot.html)<br />
Clushbot/3.13-BinaryFury ( http://www.clush.com/bot.html)<br />
Clushbot/3.16-BinaryFury ( http://www.clush.com/bot.html)<br />
Clushbot/3.18-BinaryFury ( http://www.clush.com/bot.html)<br />
Clushbot/3.2-BinaryFury ( http://www.clush.com/bot.html)<br />
Clushbot/3.21-BinaryFury ( http://www.clush.com/bot.html)<br />
Clushbot/3.23-BinaryFury ( http://www.clush.com/bot.html)<br />
Clushbot/3.24-BinaryFury ( http://www.clush.com/bot.html)<br />
Clushbot/3.3-BinaryFury ( http://www.clush.com/bot.html)<br />
Clushbot/3.31-BinaryFury ( http://www.clush.com/bot.html)<br />
Clushbot/3.33-BinaryFury ( http://www.clush.com/bot.html)<br />
Clushbot/3.38-BinaryFury ( http://www.clush.com/bot.html)<br />
Clushbot/3.41-BinaryFury ( http://www.clush.com/bot.html)<br />
Clushbot/3.42-BinaryFury ( http://www.clush.com/bot.html)<br />
Clushbot/3.47-BinaryFury ( http://www.clush.com/bot.html)<br />
Clushbot/3.48-BinaryFury ( http://www.clush.com/bot.html)<br />
Clushbot/3.49-BinaryFury ( http://www.clush.com/bot.html)<br />
Clushbot/3.5-BinaryFury ( http://www.clush.com/bot.html)<br />
Clushbot/3.50-BinaryFury ( http://www.clush.com/bot.html)<br />
Clushbot/3.52-BinaryFury ( http://www.clush.com/bot.html)<br />
Clushbot/3.53-BinaryFury ( http://www.clush.com/bot.html)<br />
Clushbot/3.57-Ajax ( http://www.clush.com/bot.html)<br />
Clushbot/3.58-Ajax ( http://www.clush.com/bot.html)<br />
Clushbot/3.59-Hector ( http://www.clush.com/bot.html)<br />
Clushbot/3.6-BinaryFury ( http://www.clush.com/bot.html)<br />
Clushbot/3.60-Peleus ( http://www.clush.com/bot.html)<br />
Clushbot/3.62-Laomedon ( http://www.clush.com/bot.html)<br />
Clushbot/3.9-BinaryFury ( http://www.clush.com/bot.html)</p>
<p><strong>[VengaBot/*]</strong><br />
VengaBot<br />
Did not read robots.txt. Appears to be a crawler for the Dutch CMS, Caret Web Content Management. The IP Address is registed to them.<br />
VengaBot/1.00; Mozilla/5.0; Firefox/1.0.6 (X11; Linux i686; es)</p>
<p><strong>[SBIder/*]</strong><br />
SiteSell<br />
From their website: SiteSell is gathering a statistical representation of topics presented on the Web as a whole. Each Web page visited is categorized under the topics that it represents, allowing our customers to know the percentage of Web pages that are about any particular topic.<br />
SBIder/0.7 (SBIder; http://www.sitesell.com/sbider.html; http://support.sitesell.com/contact-support.html)<br />
SBIder/0.8-dev (SBIder; http://www.sitesell.com/sbider.html; http://support.sitesell.com/contact-support.html)</p>
<p><strong>[*Networking4all*]</strong><br />
Networking4all Bot<br />
German ISP that can&#8217;t seem to figure out redirects and how to make a bot retrieve my files so it&#8217;s banned for hogging too many CPU cycles.<br />
SSLbot/1.0 (http://www.networking4all.com)<br />
verzamelgids.nl - Networking4all Bot/1.5<br />
verzamelgids.nl - Networking4all Bot/2.1</p>
<p><strong>[Miva (AlgoFeedback@miva.com)]</strong><br />
Miva<br />
Did not read robots.txt. From their website: Today we offer a range of products and services through our three industry-facing divisions - MIVA Media, MIVA Small Business and MIVA Direct - aimed at significantly enhancing an advertiser&#8217;s ability to improve ROI, further minimizing waste and uncertainty.<br />
Miva (AlgoFeedback@miva.com)</p>
<p><strong>[Mozilla/4.0 (compatible; MyFamilyBot/*)]</strong><br />
MyFamilyBot<br />
This is apparently the parent company of Ancestry.com and other such sites. What are they crawling my sites for? And why are they taking disallowed files?<br />
Mozilla/4.0 (compatible; MyFamilyBot/1.0; http://www.myfamilyinc.com)<br />
Mozilla/4.0 (compatible; MyFamilyBot/1.0; http://www.ancestry.com/learn/bot.aspx)<br />
Mozilla/4.0 (compatible; MyFamilyBot/1.0; http://www.myfamilyinc.com)</p>
<p><strong>[favorstarbot/*]</strong><br />
favorstarbot<br />
Didn&#8217;t read robots.txt until well into its crawl.<br />
http://favorstar.com/bot.html</p>
<p><strong>[metatagsdir/*]</strong><br />
metatagsdir<br />
Does not read robots.txt.<br />
http://metatagsdir.com/</p>
<h2>General RSS</h2>
<p><strong>[Mozilla/5.0 (compatible) GM RSS Panel]</strong><br />
RSS Panel<br />
From their website: RSS Panel is designed as a generic Greasemonkey user script for any website. It&#8217;s purpose is to display a little floating panel at the left hand top of any web page, for which a RSS feed is available from the same domain.<br />
http://www.xs4all.nl/~jlpoutre/BoT/Javascript/RSSpanel/</p>
<p><strong>[Mozilla/5.0 http://www.inclue.com; graeme@inclue.com]</strong><br />
Inclue<br />
Inclue supposedly went out of business. I&#8217;m not sure what purpose this bot serves. It did not read robots.txt.<br />
http://www.inclue.com/</p>
<h2>Google</h2>
<p><strong>[googlebot-urlconsole]</strong><br />
googlebot-urlconsole<br />
This is Google&#8217;s service for requesting that they remove a URL from their index.</p>
<p><strong>[Mozilla/4.0 (compatible; GoogleToolbar*)]</strong><br />
Google Toolbar<br />
Pre-fetches links and wastes a lot of bandwidth. My bandwidth, so of course Google doesn&#8217;t care. But I do. Banned.<br />
Mozilla/4.0 (compatible; GoogleToolbar 1.1.70-deleon; Windows 2000 5.0)<br />
Mozilla/4.0 (compatible; GoogleToolbar 2.0.111-big; Windows XP 5.1)<br />
Mozilla/4.0 (compatible; GoogleToolbar 2.0.114.10-deleon; Windows 98 SE 4.10)<br />
Mozilla/4.0 (compatible; GoogleToolbar 3.0.128.1-big; Windows XP 5.1)<br />
Mozilla/4.0 (compatible; GoogleToolbar 3.0.131.0-big; Windows XP 5.1)<br />
Mozilla/4.0 (compatible; GoogleToolbar 3.0.131.0-big; Windows XP 5.1; Google-TR-1)<br />
Mozilla/4.0 (compatible; GoogleToolbar 3.0.131.0-big; Windows XP 5.1; Google-TR-3)<br />
Mozilla/4.0 (compatible; GoogleToolbar 3.0.131.0-deleon; Windows Me 4.90)<br />
Mozilla/4.0 (compatible; GoogleToolbar 4.0.1019.2378-big; Windows XP 5.1; MSIE 6.0.2900.2180)<br />
Mozilla/4.0 (compatible; GoogleToolbar 4.0.1019.5266-big; Windows 2000 5.0; MSIE 6.0.2800.1106)<br />
Mozilla/4.0 (compatible; GoogleToolbar 4.0.1019.5266-big; Windows 5.2; MSIE 6.0.3790.1830)<br />
Mozilla/4.0 (compatible; GoogleToolbar 4.0.1019.5266-big; Windows XP 5.1; MSIE 6.0.2600.0000)<br />
Mozilla/4.0 (compatible; GoogleToolbar 4.0.1019.5266-big; Windows XP 5.1; MSIE 6.0.2800.1106)<br />
Mozilla/4.0 (compatible; GoogleToolbar 4.0.1019.5266-big; Windows XP 5.1; MSIE 6.0.2900.2180)<br />
Mozilla/4.0 (compatible; GoogleToolbar 4.0.1019.5266-big; Windows XP 5.1; MSIE 7.0.5450.4)<br />
Mozilla/4.0 (compatible; GoogleToolbar 4.0.1019.5266-big; Windows XP 5.1; MSIE 7.0.5700.6)<br />
Mozilla/4.0 (compatible; GoogleToolbar 4.0.1019.5764-big; Windows XP 5.1; MSIE 6.0.2900.2180)<br />
Mozilla/4.0 (compatible; GoogleToolbar 4.0.513.2948-big; Windows XP 5.1; MSIE 6.0.2900.2180)<br />
Mozilla/4.0 (compatible; GoogleToolbar 4.0.629.4924-big; Windows XP 5.1; MSIE 6.0.2900.2180)<br />
Mozilla/4.0 (compatible; GoogleToolbar 4.0.629.4924-big; Windows XP 5.1; MSIE 7.0.5450.4)<br />
Mozilla/4.0 (compatible; GoogleToolbar 4.0.629.4924-big; Windows XP 5.1; MSIE 7.0.5700.6)<br />
Mozilla/4.0 (compatible; GoogleToolbar 4.0.917.1454-big; Windows XP 5.1; MSIE 6.0.2900.2180)</p>
<p><strong>[Feedfetcher-Google;*]</strong><br />
Feedfetcher-Google<br />
Feedfetcher is how Google grabs RSS or Atom feeds when users choose to add them to their Google homepage.<br />
Feedfetcher-Google; ( http://www.google.com/feedfetcher.html)</p>
<h2>Hatena</h2>
<p><strong>[Hatena Bookmark/*]</strong><br />
Hatena Bookmark<br />
Appears to be a Japanese link checker and bookmarks manager.<br />
Hatena Bookmark/0.1<br />
Hatena Bookmark/0.1 (http://b.hatena.ne.jp; 1 users)</p>
<h2>HTML Validators</h2>
<p><strong>[Weblide/2.0 beta8*]</strong><br />
Weblide<br />
XHTML XML validator.<br />
Weblide/2.0 beta8 (http://alexandre.alapetite.net/distribution/weblide/; Microsoft Windows NT 5.1.2600 Service Pack 2; .NET 2.0.50727.42; fr-FR; FRA)</p>
<p><strong>[W3C_Validator/*]</strong><br />
W3C Validator<br />
W3C&#8217;s HTML Validation Service<br />
W3C_Validator/1.183 libwww-perl/5.64<br />
W3C_Validator/1.305 libwww-perl/5.64<br />
W3C_Validator/1.305.2.109 libwww-perl/5.79<br />
W3C_Validator/1.305.2.12 libwww-perl/5.64<br />
W3C_Validator/1.305.2.137 libwww-perl/5.79<br />
W3C_Validator/1.305.2.148 libwww-perl/5.800<br />
W3C_Validator/1.305.2.148 libwww-perl/5.803<br />
W3C_Validator/1.432.2.10</p>
<p><strong>[Jigsaw/* W3C_CSS_Validator_JFouffa/*]</strong><br />
Jigsaw CSS Validator<br />
Similar to the W3C HTML validator except it&#8217;s for CSS.<br />
Jigsaw/2.2.0 W3C_CSS_Validator_JFouffa/2.0<br />
Jigsaw/2.2.3 W3C_CSS_Validator_JFouffa/2.0</p>
<h2>Hurricane Electric</h2>
<p><strong>[Mozilla/5.0 (Twiceler-*]</strong><br />
Twiceler<br />
Part of Hurricane Electric.<br />
http://www.cuill.com/twiceler/robot.html</p>
<p><strong>[Twiceler*]</strong><br />
Twiceler<br />
Claims to be an experimental bot. It&#8217;s actually yet another crawler from the disgusting depths of Hurricane Electric.<br />
http://www.cuill.com/twiceler/<br />
Twiceler www.cuill.com/robots.html</p>
<p><strong>[Mozilla/4.04 (compatible; Dulance bot;*)]</strong><br />
Dulance<br />
No robots.txt. From their website: Dulance is a completely automated price comparison engine covering virtually all online merchants in North America.<br />
Mozilla/4.04 (compatible; Dulance bot; http://www.dulance.com/bot.jsp)</p>
<h2>iaskspider</h2>
<p><strong>[iaskspider]</strong><br />
iaskspider<br />
This is not from iask.com.cn.</p>
<h2>Iceweasel</h2>
<p><strong>[Iceweasel]</strong><br />
Iceweasel<br />
IceWeasel is the GNU version of the Firefox browser.<br />
http://www.gnu.org/software/gnuzilla/</p>
<h2>IE 6.0</h2>
<p><strong>[Mozilla/4.0 (compatible; MSIE 6.0; *Windows NT 6.0;*.NET CLR 2*)*]</strong><br />
IE</p>
<p>Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 6.0; .NET CLR 2.0.31113)</p>
<h2>IE 7.0</h2>
<p><strong>[Mozilla/4.0 (compatible; MSIE 7.0; *Windows NT 6.0;*.NET CLR 2*)*]</strong><br />
IE</p>
<p>Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; .NET CLR 2.0.50727; SL Commerce Client v1.0; Media Center PC 5.0)<br />
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; MSDigitalLocker Vista 1.3; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.0.04306; Media Center PC 5.0)<br />
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; MSDigitalLocker Vista 1.3; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0)<br />
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; MSDigitalLocker Vista 1.3; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04320)<br />
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; MSDigitalLocker Vista 1.3; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; WinFX RunTime 3.0.50727)<br />
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; MSDigitalLocker Vista 1.3; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; WinFX RunTime 3.0.50727; FDM)<br />
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; MSDigitalLocker Vista 1.3; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; WinFX RunTime 3.0.50727; InfoPath.1)<br />
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; MSDigitalLocker Vista 1.3; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; WinFX RunTime 3.0.50727; InfoPath.2)<br />
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.0.04320)<br />
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04320)<br />
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04320; InfoPath.2)</p>
<h2>IE 7.0b</h2>
<p><strong>[Mozilla/4.0 (compatible; MSIE 7.0b; *Windows NT 6.0;*.NET CLR 2*)*]</strong><br />
IE</p>
<p>Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0; .NET CLR 2.0.50215; SL Commerce Client v1.0; Tablet PC 2.0)<br />
Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0; .NET CLR 2.0.50215; SL Commerce Client v1.0; Tablet PC 2.0; Avalon 6.0.4030)<br />
Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0; .NET CLR 2.0.50215; SL Commerce Client v1.0; Tablet PC 2.0; Avalon 6.0.4030; WinFX RunTime 1.0.50215)<br />
Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0; .NET CLR 2.0.50727; SL Commerce Client v1.0; Tablet PC 2.0; Media Center PC 3.1; Media Center PC $(runtime.Emerald_version))</p>
<h2>Ilse</h2>
<p><strong>[Ilse]</strong><br />
Ilse<br />
Dutch search engine. Their bot appears to be well-behaved. They get good comments on WMW.</p>
<h2>Image Crawlers</h2>
<p><strong>[Mozilla/5.0 (Macintosh; U; *Mac OS X; *) AppleWebKit/* (*) Pandora/2.*]</strong><br />
Pandora<br />
From their website: The image collector&#8217;s web spider and search agent for Mac OS X.<br />
http://www.positivespinmedia.com/shareware/Pandora/</p>
<p><strong>[HTML2JPG Blackbox, http://www.html2jpg.com]</strong><br />
HTML2JPG<br />
Takes screenshots of websites which is nice. The downside is the program can run in batch mode which makes it a potential image ripper.<br />
HTML2JPG Blackbox, http://www.html2jpg.com</p>
<p><strong>[Camcrawler*]</strong><br />
Camcrawler<br />
No robots.txt. From their website: The data collected from the crawler is used to find and index webcam pages and images all over the internet.</p>
<p><strong>[pixfinder/*]</strong><br />
pixfinder<br />
Image stealer.</p>
<p><strong>[*PhotoStickies/*]</strong><br />
PhotoStickies<br />
Used for grabbing webcam images, often against website TOS.</p>
<p><strong>[rssImagesBot/0.1 (*http://herbert.groot.jebbink.nl/?app=rssImages)]</strong><br />
rssImagesBot<br />
Herbert Jebbink&#8217;s Website. Image bot does not read robots.txt.<br />
rssImagesBot/0.1 ( http://herbert.groot.jebbink.nl/?app=rssImages)</p>
<h2>Inktomi<br />
<h2>
<strong>[Mozilla/5.0 (compatible; Yahoo! Slurp China; http://misc.yahoo.com.cn/help.html)]</strong><br />
Yahoo! Slurp China<br />
I wish I could ban this bot, but it uses the same robots name as all the other Inktomi bots!<br />
Mozilla/5.0 (compatible; Yahoo! Slurp China; http://misc.yahoo.com.cn/help.html)</p>
<p><strong>[YahooSeeker/*]</strong><br />
YahooSeeker<br />
This is Yahoo&#8217;s user agent for indexing mobile content.<br />
YahooSeeker/1.0 (compatible; Mozilla 4.0; MSIE 5.5; http://help.yahoo.com/help/us/shop/merchant/)<br />
YahooSeeker/1.0 (compatible; Mozilla 4.0; MSIE 5.5; http://search.yahoo.com/yahooseeker.html)<br />
YahooSeeker/1.1 (compatible; Mozilla 4.0; MSIE 5.5; http://help.yahoo.com/help/us/shop/merchant/)<br />
YahooSeeker/1.2 (compatible; Mozilla 4.0; MSIE 5.5; yahooseeker at yahoo-inc dot com ; http://help.yahoo.com/help/us/shop/merchant/)</p>
<p><strong>[Mozilla/4.0 (compatible; Yahoo Japan; for robot study; kasugiya)]</strong><br />
Yahoo! RobotStudy<br />
Did not read robots.txt which is typical of Y! bots these days.<br />
Mozilla/4.0 (compatible; Yahoo Japan; for robot study; kasugiya)</p>
<p><strong>[Yahoo Pipes*]</strong><br />
Yahoo Pipes<br />
Pipes is a hosted service that lets you remix feeds and create new data mashups in a visual programming environment.<br />
http://pipes.yahoo.com/</p>
<h2>Internet Archive</h2>
<p><strong>[*heritrix*]</strong><br />
Heritrix<br />
From the website: Heritrix is the Internet Archive’s web crawler which was specially designed for web archiving. Me again: It&#8217;s available to anyone who wants to download it and abuse it. That&#8217;s why I&#8217;ve banned it.<br />
http://en.wikipedia.org/wiki/Heritrix<br />
http://crawler.archive.org/<br />
mozilla/5.0 (compatible; heritrix/1.0.4 http://non-exist.com)<br />
mozilla/5.0 (compatible; heritrix/1.2.0 http://lab.mokk.bme.hu/members/bridge/)<br />
mozilla/5.0 (compatible; heritrix/1.3.0 http://archive.crawler.org)<br />
mozilla/5.0 (compatible; heritrix/1.3.0 http://crawler.archive.org)<br />
Mozilla/5.0 (compatible; heritrix/1.3.0 http://www.l3s.de/)<br />
Mozilla/5.0 (compatible; heritrix/1.4.0 http://www.chepi.net)<br />
Mozilla/5.0 (compatible; heritrix/1.4.0 PROJECT_URL_HERE)<br />
Mozilla/5.0 (compatible; heritrix/1.5 http://www.metacarta.com)<br />
Mozilla/5.0 (compatible; heritrix/1.5.0 http://www.l3s.de/~kohlschuetter/projects/crawling/)<br />
Mozilla/5.0 (compatible; heritrix/1.6.0 http://innovationblog.com)<br />
Mozilla/5.0 (compatible; heritrix/1.8.0 http://wiki.office.aol.com/wiki/SEO)<br />
os-heritrix/0.5.0 ( http://crawler.archive.org)</p>
<p><strong>[InternetArchive/*]</strong><br />
InternetArchive<br />
Unsure exactly what this new user agent is doing. Some report it&#8217;s disrespectful of robots.txt. On my sites it&#8217;s been well behaved.<br />
InternetArchive/0.8-dev (Nutch; http://lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)</p>
<h2>iSiloX</h2>
<p><strong>[iSiloX]</strong><br />
iSiloX<br />
From their website: iSiloX is the desktop application that converts content to the iSilo 3.x/4.x document format, enabling you to carry that content on your Palm OS PDA, Pocket PC PDA, Windows CE Handheld PC, or Windows computer for viewing using iSilo.</p>
<h2>iVia Project</h2>
<p><strong>[iVia Project]</strong><br />
iVia Project<br />
From their website: The iVia software is used in a range of projects, including the iVia Virtual Library Software which creates and manages Virtual Libraries, both automatically, and under the direct control of living, breathing, human librarians. iVia can be used to download pages from other Web sites.<br />
http://ivia.ucr.edu/useragents.shtml</p>
<h2>Jakarta Project</h2>
<p><strong>[Jakarta Project]</strong><br />
Jakarta Project<br />
From Wikipedia: The Jakarta Project creates and maintains open source software for the Java platform. It operates as an umbrella project under the auspices of the Apache Software Foundation, and all of Jakarta products are released under the Apache License.<br />
http://jakarta.apache.org/</p>
<h2>Jayde Online</h2>
<p><strong>[exactseek-pagereaper-* (crawler@exactseek.com)]</strong><br />
exactseek-pagereaper<br />
ExactSeek is a Meta-tag search engine. Your site will not be added if it does not have Title and Meta Description tags.<br />
exactseek-pagereaper-2.63 (crawler@exactseek.com)</p>
<p><strong>[ExactSeek Crawler/*]</strong><br />
ExactSeek Crawler<br />
ExactSeek is a Meta-tag search engine. Your site will not be added if it does not have Title and Meta Description tags.</p>
<h2>K-Meleon</h2>
<p><strong>[K-Meleon]</strong><br />
K-Meleon<br />
From their website: iSiloX is the desktop application that converts content to the iSilo 3.x/4.x document format, enabling you to carry that content on your Palm OS PDA, Pocket PC PDA, Windows CE Handheld PC, or Windows computer for viewing using iSilo.</p>
<h2>Konqueror</h2>
<p><strong>[Konqueror]</strong><br />
Konqueror<br />
I am only supporting valid Konqueror user agents. If you&#8217;ve modified your user agent so it no longer matches the standard please don&#8217;t complain to me about it.</p>
<h2>Link Checkers</h2>
<p><strong>[*Zeus*]</strong><br />
Zeus<br />
From their website: Using link-building programs that query or use the search engines for finding web sites, data or information can penalize or even get your web site banned.<br />
Zeus 14530 Webster Pro V2.9 Win32<br />
Zeus 15180 Webster Pro V2.9 Win32<br />
Zeus 15355 Webster Pro V2.9 Win32<br />
Zeus 19850 Webster Pro V2.9 Win32<br />
Zeus 2.6<br />
Zeus 21628 Webster Pro V2.9 Win32<br />
Zeus 27567 Webster Pro V2.9 Win32<br />
Zeus 30979 Webster Pro V2.9 Win32<br />
Zeus 31264 Webster Pro V2.9 Win32<br />
Zeus 35520 Webster Pro V2.9 Win32<br />
Zeus 40201 Webster Pro V2.9 Win32<br />
Zeus 43271 Webster Pro V2.9 Win32<br />
Zeus 47063 Webster Pro V2.9 Win32<br />
Zeus 47844 Webster Pro V2.9 Win32<br />
Zeus 49814 Webster Pro V2.9 Win32<br />
Zeus 50267 Webster Pro V2.9 Win32<br />
Zeus 54093 Webster Pro V2.9 Win32<br />
Zeus 68378 Webster Pro V2.9 Win32<br />
Zeus 73457 Webster Pro V2.9 Win32<br />
Zeus 7393 Webster Pro V2.9 Win32<br />
Zeus 75505 Webster Pro V2.9 Win32<br />
Zeus 7913 Webster Pro V2.9 Win32<br />
Zeus 86701 Webster Pro V2.9 Win32<br />
Zeus 95389 Webster Pro V2.9 Win32<br />
Zeus 96481 Webster Pro V2.9 Win32<br />
Zeus ThemeSite Viewer Webster Pro V2.9 Win32<br />
Zeusbot/0.07 (Ulysseek&#8217;s web-crawling robot; http://www.zeusbot.com; agent@zeusbot.com)</p>
<p><strong>[!Susie (http://www.sync2it.com/susie)]</strong><br />
!Susie<br />
Social bookmarking: what a stupid phrase! This is a link checker. See also just plain Susie in this same section.<br />
!Susie (http://www.sync2it.com/susie)</p>
<p><strong>[Bookdog/*]</strong><br />
Bookdog<br />
From their website: Bookdog can sort, organize, eliminate duplicates, automatically verify, migrate and synchronize bookmarks between Safari, Camino, Firefox, OmniWeb and Opera.<br />
http://www.sheepsystems.com/products/bookdog/</p>
<p><strong>[JRTwine Software Check Favorites Utility]</strong><br />
JRTwine<br />
This bot is checking my downloads page which is a violation of my TOS so it&#8217;s banned.</p>
<p><strong>[FavOrg]</strong><br />
FavOrg<br />
Favorites Manager PC Magazine utility<br />
FavOrg</p>
<p><strong>[RPT-HTTPClient/*]</strong><br />
RPT-HTTPClient<br />
Not sure what this is doing but it didn&#8217;t read robots.txt first. Usually you see this agent at the end of an agent string along with something like JCheckLinks.<br />
RPT-HTTPClient/0.3-3<br />
RPT-HTTPClient/0.3-3E</p>
<p><strong>[Link Valet Online*]</strong><br />
Link Valet<br />
From their website: Link Valet is a WWW Link checker. When you enter the URL of an HTML page on the Web, it will fetch the page, and print a report on it. Link Valet will also spider your site.<br />
Link Valet Online 1.1</p>
<p><strong>[CheckLinks/*]</strong><br />
CheckLinks<br />
This does more than the name implies. It can strip entire websites.</p>
<p><strong>[Funnel Web Profiler*]</strong><br />
Funnel Web Profiler<br />
A legitimate site mapping tool that is often abused.</p>
<p><strong>[Mozilla/4.0 (compatible; SuperCleaner*;*)]</strong><br />
SuperCleaner<br />
Finds and removes websites from your Favorites list that are no longer working.<br />
Mozilla/4.0 (compatible; SuperCleaner 2.57; Windows NT 5.1)<br />
Mozilla/4.0 (compatible; SuperCleaner 2.67; Windows NT 5.1)<br />
Mozilla/4.0 (compatible; SuperCleaner 2.75; Windows NT 5.1)<br />
Mozilla/4.0 (compatible; SuperCleaner 2.84; Windows NT 5.1)<br />
Mozilla/4.0 (compatible; SuperCleaner 2.90; Windows NT 5.1)<br />
Mozilla/4.0 (compatible; SuperCleaner 2.93; Windows NT 5.1)</p>
<p><strong>[VSE/*]</strong><br />
VSE Link Tester<br />
The section of the user agent in parenthesis contains custom text entered by each user of the product.<br />
VSE/1.0 (testcrawler@hotmail.com)<br />
VSE/1.0 (testcrawler@vivisimo.com)<br />
VSE/1.0 (vivisimolog@web121.com)<br />
VSE/1.0 (vsecrawler@hotmail.com)</p>
<p><strong>[Xenu* Link Sleuth*]</strong><br />
Xenu&#8217;s Link Sleuth<br />
This is, or at least can be a very disrespectful and harmful link checker.<br />
Xenu Link Sleuth 1.1f<br />
Xenu Link Sleuth 1.2a<br />
Xenu Link Sleuth 1.2b<br />
Xenu Link Sleuth 1.2d<br />
Xenu Link Sleuth 1.2e<br />
Xenu Link Sleuth 1.2f<br />
Xenu Link Sleuth 1.2g<br />
Xenu Link Sleuth 1.2h<br />
Xenu&#8217;s Link Sleuth 1.0p<br />
Xenu&#8217;s Link Sleuth 1.1c</p>
<p><strong>[SiteBar/*]</strong><br />
SiteBar<br />
This is a SourceForge bookmarks manager.<br />
SiteBar/3.2.6<br />
SiteBar/3.3.2 (Bookmark Server; http://sitebar.org/)<br />
SiteBar/3.3.3 (Bookmark Server; http://sitebar.org/)<br />
SiteBar/3.3.5 (Bookmark Server; http://sitebar.org/)</p>
<p><strong>[Z-Add Link Checker*]</strong><br />
Z-Add Link Checker<br />
Web page that lets you check a URL.<br />
Z-Add Link Checker (http://w3.z-add.co.uk/linkcheck/)</p>
<p><strong>[Mozilla/4.0 (compatible; smartBot/1.*; checking links; *)]</strong><br />
smartBot<br />
The UA indicates it&#8217;s a link checker. On my sites all it did was a HEAD my sitemaps.<br />
http://www.smartbot.com.au/</p>
<p><strong>[DocWeb Link Crawler (http://doc.php.net)]</strong><br />
DocWeb Link Crawler<br />
PHP&#8217;s documentation link checker.<br />
DocWeb Link Crawler (http://doc.php.net)</p>
<p><strong>[Mozilla/5.0 gURLChecker/*]</strong><br />
gURLChecker<br />
From their website: gURLChecker is a graphical web links checker for GNU/Linux and other POSIX OS. It can work on a whole site, a single local page or a browser bookmarks file. From my perspective it&#8217;s being used to automate checks of my downloads page which is a violation of my TOS so it&#8217;s banned.<br />
Mozilla/5.0 gURLChecker/0.8.0 ssl (Linux)</p>
<p><strong>[Mozilla/4.0 (Compatible); URLBase*]</strong><br />
URLBase<br />
Bookmarks manager. Seems like a nice product but it&#8217;s being used to check my downloads page in violation of my TOS.<br />
Mozilla/4.0 (Compatible); URLBase 6</p>
<p><strong>[Mozilla/4.0 (compatible; Link Utility; http://net-promoter.com)]</strong><br />
NetPromoter Link Utility<br />
From their website: Link Utility is a powerful site management and link checker tool that helps webmasters automate the process of web site testing.<br />
Mozilla/4.0 (compatible; Link Utility; http://net-promoter.com)</p>
<p><strong>[Susie (http://www.sync2it.com/bms/susie.php]</strong><br />
Susie<br />
Social bookmarking website. From their website: Susie, Sync2It&#8217;s automated librarian, visits each of the websites bookmarked by our active user community right after it is uploaded to our server.<br />
Susie (http://www.sync2it.com/bms/susie.php</p>
<p><strong>[onCHECK Linkchecker von www.scientec.de fuer www.onsinn.de]</strong><br />
onCHECK Linkchecker<br />
Seems to be a link checker but the only information I can find is in German.<br />
onCHECK Linkchecker von www.scientec.de fuer www.onsinn.de</p>
<p><strong>[Robozilla/*]</strong><br />
Robozilla<br />
Visits sites listed in ODP to verify they&#8217;re still functional.<br />
Robozilla/1.0</p>
<p><strong>[ActiveBookmark *]</strong><br />
ActiveBookmark<br />
From their website: Main feature of Active Bookmark is ability make bookmark to concrete place of the page.<br />
ActiveBookmark 1.0<br />
ActiveBookmark 1.1</p>
<h2>Lycoris Desktop/LX</h2>
<p><strong>[Lycoris Desktop/LX]</strong><br />
Lycoris Desktop/LX<br />
Lycoris bills itself as a Linux-based but easy to use desktop alternative to Windows including a Mozilla-based web browser.</p>
<h2>Media Players</h2>
<p><strong>[vobsub]</strong><br />
vobsub<br />
vobsub is a plug-in for VirtualDub that allows you to rip subtitles from DVD VOB files and to use the provided DirectShow filter for DivX playback with subtitles.</p>
<h2>Microsoft</h2>
<p><strong>[Microsoft BITS/*]</strong><br />
BITS<br />
BITS is a system service that applications can use to transfer files asynchronously between a client and an HTTP server.<br />
Microsoft BITS/6.7</p>
<h2>Microsoft_Internet_Explorer</h2>
<p><strong>[Microsoft_Internet_Explorer]</strong><br />
Microsoft_Internet_Explorer<br />
I have no idea what this is. It shows up with a variation on the basic user agent, tries to read zzrobots.txt, then scrapes my downloads page and leaves.</p>
<h2>Miscellaneous Browsers</h2>
<p><strong>[Mozilla/5.0 (Macintosh; ?; PPC Mac OS X;*) AppleWebKit/* (*) HistoryHound/*]</strong><br />
HistoryHound<br />
Used for going back to websites in History and Bookmarks. Supposedly works with any Mac browser.<br />
Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/417.9 (KHTML, like Gecko) HistoryHound/1.9</p>
<p><strong>[SCEJ PSP BROWSER 0102pspNavigator]</strong><br />
Wipeout Pure<br />
Some sort of web browser for Sony&#8217;s PSP.<br />
SCEJ PSP BROWSER 0102pspNavigator</p>
<p><strong>[ogeb browser , Version 1.1.0]</strong><br />
ogeb browser<br />
I cannot find any info on this supposed browser. Maybe it&#8217;s a spoof. Either way it was not badly behaved.</p>
<p><strong>[Kopiczek/* (WyderOS*; *)]</strong><br />
Kopiczek<br />
Polish browser using WyderOS. I can&#8217;t find out anything more about it than that.</p>
<p><strong>[Mozilla/4.0 (compatible; ibisBrowser)]</strong><br />
ibisBrowser<br />
Japanese language web browser.</p>
<p><strong>[Mozilla/* (Win32;*Escape?*; ?)]</strong><br />
Escape<br />
Espial Escape, a Java browser with scalable configuration capabilities, can be setup to match the memory requirements of a wide range of devices. Escape allows developers to selectively disable support for certain Internet standards so that browsers can be tailored to run in very resource-constrained designs or offer full functionality for other more powerful devices.<br />
Mozilla/4.61 <strong>[en]</strong> (Win32; Escape 4.8; U)<br />
Mozilla/4.61 <strong>[en]</strong> (Win32; Escape 5.03; I)<br />
Mozilla/4.76 <strong>[en]</strong> (Win32;Escape 4.8; U)<br />
Mozilla/4.76 <strong>[en]</strong> (Win32;Escape 5.03; U)</p>
<p><strong>[Sleipnir*]</strong><br />
Sleipnir<br />
A Japanese browser that can also be scripted thus turning it into a website stripper. For now it seems well behaved on my sites so I won&#8217;t ban it. It is a wrapper and comes in versions for Trident and Gecko.<br />
Sleipnir<br />
Sleipnir Version 1.40<br />
Sleipnir Version 1.41<br />
Sleipnir Version 1.42<br />
Sleipnir Version 1.66<br />
Sleipnir/2.40<br />
Sleipnir/2.41<br />
Sleipnir/2.45<br />
Sleipnir/2.46<br />
Sleipnir/2.47</p>
<p><strong>[NetRecorder*]</strong><br />
NetRecorder<br />
Home page is no longer operational.</p>
<p><strong>[GreenBrowser]</strong><br />
GreenBrowser<br />
From their website: GreenBrowser is yet another IE based browser that offers tabbed, multi-page browsing and many additional features including grouped pages, ad filtering, search engine integration, privacy cleaner, form filler and much more.<br />
GreenBrowser</p>
<h2>Mozilla 1.9</h2>
<p><strong>[Mozilla 1.9]</strong><br />
Mozilla<br />
From their website: Gran Paradiso Alpha 1 is an early developer milestone for the next generation of Mozilla’s layout engine, Gecko 1.9.<br />
http://developer.mozilla.org/devnews/index.php/2006/12/08/gran-paradiso-alpha-1-now-available-for-download/</p>
<h2>NameProtect</h2>
<p><strong>[NPBot*]</strong><br />
NameProtect<br />
NPBot (NameProtect Bot) engages in crawling activity in search of a wide range of brand and other intellectual property violations that may be of interest to their clients. It does seem to read and respect robots.txt but I don&#8217;t want it crawling my site.<br />
NPBot<br />
NPBot (http://www.nameprotect.com/botinfo.html)<br />
NPBot-1/2.0<br />
NPBot-1/2.0 (http://www.nameprotect.com/botinfo.html)<br />
NPBot/3 (NPBot; http://www.nameprotect.com; npbot@nameprotect.com)</p>
<p><strong>[NP/*]</strong><br />
NameProtect<br />
NP (NameProtect) engages in crawling activity in search of a wide range of brand and other intellectual property violations that may be of interest to their clients. It does seem to read and respect robots.txt but I don&#8217;t want it crawling my site.<br />
NP/0.1 (NP; http://www.nameprotect.com; npbot@nameprotect.com)</p>
<h2>Naver<br />
<h2>
<strong>[Cowbot-* (NHN Corp*naver.com)]</strong><br />
Naver Cowbot<br />
Seems to be associated with naver.com.<br />
Cowbot-0.1.1 (NHN Corp. / 82-2-3011-1954 / nhnbot@naver.com)</p>
<p><strong>[Yeti/*]</strong><br />
Yeti<br />
Part of naver.com.</p>
<h2>NewsGator</h2>
<p><strong>[NewsGator/*]</strong><br />
NewsGator<br />
Did not request robots.txt and got caught in a bot trap.<br />
http://www.newsgator.com</p>
<p><strong>[NetNewsWire*/*]</strong><br />
NetNewsWire<br />
From their website: NetNewsWire is an easy-to-use RSS Web news reader for Mac OS X.<br />
NetNewsWire/2.0.1 (Mac OS X; http://ranchero.com/netnewswire/)</p>
<h2>Nutch<br />
<h2>
<strong>[CazoodleBot/*]</strong><br />
CazoodleBot<br />
This is Nutch in disguise!<br />
http://www.cazoodle.com</p>
<p><strong>[Nutch]</strong><br />
Nutch<br />
Does not read robots.txt. I have no idea what this company does. Their website is essentially a blank page.</p>
<p><strong>[LOOQ/0.1*]</strong><br />
LOOQ<br />
Claims to be Nutch (in disguise).<br />
LOOQ/0.1 alfa (LOOQ Crawler for european sites; http://looq.eu; root (at) looq dot eu)</p>
<h2>Offline Browsers</h2>
<p><strong>[*HTTrack*]</strong><br />
HTTrack<br />
From their website: HTTrack is a free (GPL, libre/open source) and easy-to-use offline browser utility.<br />
HTTrack Website Copier/3.0x (offline browser; web mirror utility)<br />
Mozilla/4.5 (compatible; HTTrack 2.0x; Windows 98)<br />
Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 2000)<br />
Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)</p>
<p><strong>[*Check&#038;Get*]</strong><br />
Check&#038;Get<br />
From their website: Check&#038;Get is handy and powerful bookmark manager and web monitoring program that lets you organize your browser bookmarks, check your favorite Internet pages and detect if their content has changed or has become unavailable. My comments: While it did not read any excluded files neither did it consult robots.txt first to be certain of that. That&#8217;s why this ua is in the stripper category.<br />
Mozilla/2.0 compatible; Check&#038;Get 1.14 (Windows 98)<br />
Mozilla/2.0 compatible; Check&#038;Get 1.14 (Windows NT)<br />
Mozilla/4.0 (compatible; Check&#038;Get 3.0; Windows NT)</p>
<p><strong>[*TweakMASTER*]</strong><br />
TweakMASTER<br />
Claims to be an Internet connection optimizer and what amounts to an offline browser.<br />
Mozilla/3.0 (compatible; TweakMASTER 2.06784; Windows NT 5.1)<br />
Mozilla/3.0 (compatible; TweakMASTER 2.06788; Windows ME)<br />
Mozilla/3.0 (compatible; TweakMASTER)<br />
TweakMASTER 2.x</p>
<h2>Online Scanners</h2>
<p><strong>[Morfeus Fucking Scanner]</strong><br />
Morfeus Fucking Scanner<br />
Morfeus Fucking Scanner looking for php vulnerabilties from this data center: Coreix Limited Admin (COREIX-DS2).</p>
<p><strong>[Mozilla/4.0 (compatible; Trend Micro tmdr 1.*]</strong><br />
Trend Micro<br />
This is the HouseCall online virus scanner from Trend Micro.<br />
Mozilla/4.0 (compatible; Trend Micro tmdr 1.0-1000)<br />
Mozilla/4.0 (compatible; Trend Micro tmdr 1.0-1032)<br />
Mozilla/4.0 (compatible; Trend Micro tmdr 1.0-1110)<br />
Mozilla/4.0 (compatible; Trend Micro tmdr 1.0-1139)<br />
Mozilla/4.0 (compatible; Trend Micro tmdr 1.2-1003)</p>
<p><strong>[virus_detector*]</strong><br />
Secure Computing Corporation<br />
Sells anti-spam and security products. Not sure why they crawl my website.<br />
virus_detector (virus_harvester@securecomputing.com)<br />
virus_detector virus_harvester@securecomputing.com</p>
<p><strong>[Titanium 2005 (4.02.01)]</strong><br />
Panda Antivirus Titanium<br />
This appears to be Panda Antivirus Titanium 2005. Based on the files it requested it appears to be a human browsing several of my websites. I do not understand why I&#8217;m seeing this user agent unless it&#8217;s spoofed.</p>
<h2>PeerFactory</h2>
<p><strong>[PeerFactory]</strong><br />
PeerFactory<br />
JAVA class. Very badly behaved crawler. Took my index page hundreds of times before it was automatically banned.<br />
http://www.nextapp.com/platform/echo2/echo/doc/api/public/app/nextapp/echo2/app/util/PeerFactory.html</p>
<h2>Pocket PC</h2>
<p><strong>[*(compatible; MSIE *.*; Windows CE; PPC; *)]</strong><br />
Pocket PC<br />
Siemens mobile devices running WinCE.</p>
<h2>Pogodak</h2>
<p><strong>[Mozilla/5.0 (compatible; TridentSpider/*)]</strong><br />
Pogodak!<br />
Used to be Trident Search. Now Pogodak!.<br />
Mozilla/5.0 (compatible; TridentSpider/3.1)</p>
<p><strong>[Pogodak]</strong><br />
Pogodak!<br />
The entire D class where this crawler comes from is banned for abusive behavior.</p>
<h2>Proxy Servers</h2>
<p><strong>[CE-Preload]</strong><br />
CE-Preload<br />
Cisco Content Engine<br />
CE-Preload</p>
<p><strong>[Mozilla/5.0 (compatible; del.icio.us-thumbnails/*; *) KHTML/* (like Gecko)]</strong><br />
Yahoo!<br />
I am sick of Yahoo&#8217;s open proxies. I banned the entire netrange for YAHOO-3. Just this week alone my site got ripped by a dozen user agents from this proxy.</p>
<p><strong>[ProxyTester*]</strong><br />
ProxyTester<br />
Software that looks for proxy servers and uses them to surf more or less anonymously.<br />
ProxyTester</p>
<p><strong>[SurfControl]</strong><br />
SurfControl<br />
From their website: SurfControl helps companies stop unwanted content. Our highly sophisticated Content Filters understand Internet content, and put you back in control by filtering out the material you don&#8217;t want, so you can get to what you do want, when you want it.<br />
SurfControl</p>
<h2>Research Projects</h2>
<p><strong>[USyd-NLP-Spider*]</strong><br />
USyd-NLP-Spider<br />
It claims to read and respect robots.txt. It did read it but it did not respect it. From their website: USyd-NLP-Spider gathers HTML pages for the purpose of research in Natural Language Processing at the School of Information Technologies, University of Sydney, Australia.<br />
USyd-NLP-Spider (http://www.it.usyd.edu.au/~vinci/bot.html)</p>
<p><strong>[woriobot*]</strong><br />
woriobot<br />
University of British Columbia Laboratory for Computational Sciences.<br />
http://www.worio.com/</p>
<p><strong>[CMS crawler (?http://buytaert.net/crawler/)]</strong><br />
Research Projects<br />
This is some university student who is heavily involved with Drupal. The URL in the UA is wrong. No robots.txt.</p>
<p><strong>[Taiga web spider]</strong><br />
Taiga<br />
Yet another annoying and badly behaved bot from the brilliant students at Brown Univeristy. It doesn&#8217;t request robots.txt until several minutes into the crawl and then it doesn&#8217;t respect disallowed files and winds up in a bot trap. I have their netrange banned in my firewall.</p>
<p><strong>[wwwster/* (Beta, mailto:gue@cis.uni-muenchen.de)]</strong><br />
wwwster<br />
Probably a research bot. Sent an e-mail on 1/15/2006.<br />
wwwster/1.4 (Beta, mailto:gue@cis.uni-muenchen.de)</p>
<p><strong>[Forschungsportal/*]</strong><br />
Forschungsportal<br />
Used by Federal Ministry of Education and Research</p>
<p><strong>[UofTDB_experiment* (leehyun@cs.toronto.edu)]</strong><br />
UofTDB Experiment<br />
Yet another research project from a university. This time it&#8217;s the University of Toronto.<br />
UofTDB_experiment (leehyun@cs.toronto.edu)</p>
<p><strong>[HooWWWer/*]</strong><br />
HooWWWer<br />
Crawler for a research service called Next Generation Information Retrieval. The author says all the right things about ethical crawling on his site. So far this seems to be a well-behaved crawler. All it&#8217;s done so far though is read robots.txt. Still, being a research project I choose to ban it.<br />
HooWWWer/2.1.0 ( http://cosco.hiit.fi/search/hoowwwer/ | mailto:crawler-info<at>hiit.fi)<br />
HooWWWer/2.1.3 (debugging run) ( http://cosco.hiit.fi/search/hoowwwer/ | mailto:crawler-info<at>hiit.fi)<br />
HooWWWer/2.2.0 (debugging run) ( http://cosco.hiit.fi/search/hoowwwer/ | mailto:crawler-info<at>hiit.fi)</p>
<p><strong>[Amico Alpha * (*) Gecko/* AmicoAlpha/*]</strong><br />
Amico Alpha<br />
According to their website this organization dissolved in 2005. Maybe it&#8217;s coming back to life. Their crawler does not read robots.txt and fell into a spider trap.<br />
Amico Alpha 1.0 (Windows; U; Win98; de-DE; rv:1.1.1) Gecko/20051001 AmicoAlpha/1.0</p>
<h2>Rippers</h2>
<p><strong>[SiteParser/*]</strong><br />
SiteParser<br />
From their website: The SiteParser is a site indexer, it will index your web pages through the internet or locally on your hard drive. I&#8217;m banning it because the author asked people to restrict it to domains owned by the user but clearly that&#8217;s not happening.<br />
SiteParser/1</p>
<p><strong>[Mozilla/2.0 (compatible; NEWT ActiveX; Win32)]</strong><br />
NEWT ActiveX<br />
This used to be a product from Delphi but the product has been abandoned.<br />
Mozilla/2.0 (compatible; NEWT ActiveX; Win32)</p>
<p><strong>[Mozilla/4.0 (compatible; BorderManager*)]</strong><br />
Novell BorderManager<br />
For some reason I only see this user agent stealing photos from my various websites.<br />
Mozilla/4.0 (compatible; BorderManager 3.0)</p>
<p><strong>[AutoHotkey]</strong><br />
AutoHotkey<br />
From their website: AutoHotkey is a free, open-source utility for Windows that will let you automate almost anything by sending keystrokes and mouse clicks. You can write a mouse or keyboard macro by hand or use the macro recorder.<br />
http://www.autohotkey.com/</p>
<p><strong>[3wGet/*]</strong><br />
3wGet<br />
From their website: 3wGet is the powerful download manager and websites downloader. It is designed for downloading files and web servers from Internet with the best possible speed which your connection can give you. It&#8217;s achieved due to splitting downloading file onto several sections, each of which is downloading simultaneously.<br />
3wGet/151</p>
<p><strong>[Holmes/*]</strong><br />
Holmes<br />
Holmes is an easy-to-use addition to MacOS 8.5&#8217;s Sherlock which provides the user with the ability to create search sets with similar Internet search sites (plug-ins) grouped together.<br />
Holmes/1.0<br />
holmes/2.3<br />
holmes/2.4<br />
holmes/3.9 (onet.pl)</p>
<p><strong>[sherlock/*]</strong><br />
Sherlock<br />
Now, instead of tediously selecting Web search sites in Sherlock, simply select a set in No Shoot! Sherlock and launch Sherlock. Your set is now present in Sherlock without all the clutter of your remaining SRC files. Check the program site for more information.<br />
sherlock/1.0</p>
<p><strong>[OCN-SOC/*]</strong><br />
OCN-SOC<br />
Japanese page ripper<br />
OCN-SOC/1.0</p>
<p><strong>[CFNetwork/*]</strong><br />
CFNetwork<br />
I&#8217;m not positive about this because I can&#8217;t test it myself. Based on my research it&#8217;s my understanding this is the user agent that&#8217;s sent when you use Cocoa&#8217;s NSURL function to fetch a web page.<br />
CFNetwork/0.9<br />
CFNetwork/1.1<br />
CFNetwork/10.4.3<br />
CFNetwork/10.4.4<br />
CFNetwork/129.10<br />
CFNetwork/129.13<br />
CFNetwork/129.16<br />
CFNetwork/4.0</p>
<p><strong>[URL2File/*]</strong><br />
URL2File<br />
From their website: URL2File is a free 32bit Windows console-mode application able to retrieve and save the content of a given World Wide Web or FTP URL to a local file.<br />
URL2File/2.0 (Win98)</p>
<p><strong>[libcurl-agent/*]</strong><br />
libcurl<br />
A multiprotocol file transfer library related to cURL.<br />
libcurl-agent/1.0</p>
<p><strong>[WinScripter iNet Tools]</strong><br />
WinScripter iNet Tools<br />
From their website: wsInetTools v0.3 beta: is a COM dll written in C++ that allows you to easily send email and download a web page and binary contents such as images, programs, etc.<br />
WinScripter iNet Tools</p>
<p><strong>[HttpSession]</strong><br />
HttpSession<br />
From their website: The servlet container uses this interface to create a session between an HTTP client and an HTTP server.<br />
HttpSession</p>
<p><strong>[httpunit/*]</strong><br />
HttpUnit<br />
Site tester being used as a site ripper. Does not read robots.txt<br />
httpunit/1.5</p>
<p><strong>[Artera (Version *)]</strong><br />
Artera<br />
Internet Accelerator</p>
<p><strong>[PigBlock (Windows NT 5.1; U)*]</strong><br />
PigBlock<br />
PigBlock (Windows NT 5.1; U) <strong>[en]</strong><br />
PigBlock (Windows NT 5.1; U) <strong>[en]</strong> Gecko<br />
PigBlock (Windows NT 5.1; U) <strong>[en]</strong><br />
PigBlock (Windows NT 5.1; U) <strong>[en]</strong> Gecko</p>
<p><strong>[BasicHTTP/*]</strong><br />
BasicHTTP<br />
From their website: A full-featured HTTP socket for REALBasic.<br />
BasicHTTP/1.0</p>
<p><strong>[W3CRobot/*]</strong><br />
W3CRobot<br />
I don&#8217;t like automated agents using my downloads.asp page to check for browscap.ini updates!<br />
W3CRobot/5.4.0 libwww/5.4.0</p>
<p><strong>[3D-FTP/*]</strong><br />
3D-FTP<br />
From their website: 3D-FTP is FTP Client software helping you transfer files up to 20x faster over Internet.<br />
3D-FTP/7.0</p>
<p><strong>[POE-Component-Client-HTTP/*]</strong><br />
POE-Component-Client-HTTP<br />
From their website: a HTTP user-agent component<br />
POE-Component-Client-HTTP/0.510 (perl; N; POE; en; rv:0.510)<br />
POE-Component-Client-HTTP/0.65 (perl; N; POE; en; rv:0.650000)</p>
<p><strong>[Twisted PageGetter]</strong><br />
Twisted PageGetter<br />
This is a Python Twisted-based spider. It did not read robots.txt.<br />
Twisted PageGetter</p>
<p><strong>[DataCha0s/*]</strong><br />
DataCha0s<br />
All reports indicate this crawler is dedicated to finding programs with known vulnerabilities. In particular it seems to like web stats programs and gallery applications. It was originally located at http://datacha0s.50megs.com but that site no longer exists.<br />
DataCha0s/2.0</p>
<p><strong>[SBL-BOT*]</strong><br />
BlackWidow<br />
BlackWidow is a site scanner, a site mapping tool, a site ripper, a site mirroring tool, an offline browser. Use it to scan a site and create a complete profile of the site&#8217;s structure, files, E-mail addresses, external links and even link errors. BlackWidow will also scan HTTP sites, SSL sites (HTTPS) and FTP sites.<br />
SBL-BOT (http://sbl.net)</p>
<p><strong>[LeechFTP]</strong><br />
LeechFTP<br />
Being used via a Thai DC. LeechFTP as a project died in 1999 so I cannot imagine why anyone is still using it.<br />
http://en.wikipedia.org/wiki/LeechFTP<br />
http://www.asiagenial.com/en/</p>
<p><strong>[CobWeb/*]</strong><br />
CobWeb<br />
HTML editor that can also be used to rip websites.</p>
<p><strong>[hcat/*]</strong><br />
hcat<br />
A program that uses the Perl socket library to do simple HTTP operations.</p>
<p><strong>[Open Web Analytics Bot*]</strong><br />
Open Web Analytics Bot<br />
Originally a reporting system for WordPress. Now it can be used to crawl websites.</p>
<p><strong>[Snoopy*]</strong><br />
Snoopy<br />
From their website: Snoopy is a PHP class that simulates a web browser. It automates the task of retrieving web page content and posting forms, for example.</p>
<p><strong>[Custo*]</strong><br />
Custo<br />
From their website: Capable of reading HTML, CSS, JavaScript, and Shockwave Flash, Custo allows you to quickly retrieve information about the structure of a Web site.<br />
Custo 1.7 (www.netwu.com)<br />
Custo 1.8 (www.netwu.com)<br />
Custo 1.9 (www.netwu.com)<br />
Custo 2.0 (www.netwu.com)</p>
<p><strong>[curl/*]</strong><br />
cURL<br />
From their website: Curl is a command line tool for transferring files with URL syntax, supporting FTP, FTPS, HTTP, HTTPS, GOPHER, TELNET, DICT, FILE and LDAP. Curl supports HTTPS certificates, HTTP POST, HTTP PUT, FTP uploading, kerberos, HTTP form based upload, proxies, cookies, user+password authentication, file transfer resume, http proxy tunneling and a busload of other useful tricks.<br />
curl/7.10.2 (powerpc-apple-darwin7.0) libcurl/7.10.2 OpenSSL/0.9.7b zlib/1.1.4<br />
curl/7.10.3 (i386-redhat-linux-gnu) libcurl/7.10.3 OpenSSL/0.9.6b zlib/1.1.3<br />
curl/7.10.3 (i386-unknown-openbsd3.1) libcurl/7.10.3 OpenSSL/0.9.6b ipv6 zlib/1.1.4<br />
curl/7.10.3 (i586-mandrake-linux-gnu) libcurl/7.10.3 OpenSSL/0.9.7a zlib/1.1.4<br />
curl/7.10.3 (i686-pc-linux-gnu) libcurl/7.10.3 OpenSSL/0.9.7 zlib/1.1.4<br />
curl/7.10.4 (i386-redhat-linux-gnu) libcurl/7.10.4 OpenSSL/0.9.6b ipv6 zlib/1.1.4<br />
curl/7.10.4 (i386-redhat-linux-gnu) libcurl/7.10.4 OpenSSL/0.9.6b zlib/1.1.3<br />
curl/7.10.4 (i386-redhat-linux-gnu) libcurl/7.10.4 OpenSSL/0.9.6b zlib/1.1.4<br />
curl/7.10.4 (i686-pc-linux-gnu) libcurl/7.10.4 OpenSSL/0.9.6b zlib/1.1.3<br />
curl/7.10.5 (i386-unknown-openbsd3.1) libcurl/7.10.5 OpenSSL/0.9.6b ipv6 zlib/1.1.4<br />
curl/7.10.5 (i686-pc-linux-gnu) libcurl/7.10.5 OpenSSL/0.9.6b zlib/1.1.4<br />
curl/7.10.5 (i686-suse-linux) libcurl/7.10.5 OpenSSL/0.9.7b ipv6 zlib/1.1.4<br />
curl/7.10.6 (i386-redhat-linux-gnu) libcurl/7.10.6 OpenSSL/0.9.7a ipv6 zlib/1.1.4<br />
curl/7.10.6 (i386-redhat-linux-gnu) libcurl/7.10.6 OpenSSL/0.9.7a ipv6 zlib/1.2.0.7<br />
curl/7.10.6 (i386-redhat-linux-gnu) libcurl/7.10.6 OpenSSL/0.9.7a ipv6 zlib/1.2.1.2<br />
curl/7.10.6 (i386-redhat-linux-gnu) libcurl/7.11.0 OpenSSL/0.9.7a zlib/1.1.4<br />
curl/7.10.7 (i386-portbld-freebsd4.3) libcurl/7.10.7 OpenSSL/0.9.6g zlib/1.1.4<br />
curl/7.10.7 (i386-portbld-freebsd4.8) libcurl/7.10.7 OpenSSL/0.9.7a ipv6 zlib/1.1.4<br />
curl/7.10.7 (i586-mandrake-linux-gnu) libcurl/7.10.7 OpenSSL/0.9.7b zlib/1.1.4<br />
curl/7.10.7 (i686-pc-linux-gnu) libcurl/7.10.7 OpenSSL/0.9.7c zlib/1.1.4<br />
curl/7.10.7 (i686-redhat-linux-gnu) libcurl/7.10.7 OpenSSL/0.9.6b ipv6 zlib/1.1.4<br />
curl/7.10.8 (i686-pc-linux-gnu) libcurl/7.10.8 OpenSSL/0.9.6c zlib/1.1.4<br />
curl/7.10.8 (i686-pc-linux-gnu) libcurl/7.10.8 OpenSSL/0.9.7a ipv6 zlib/1.1.4<br />
curl/7.11.0 (i386-portbld-freebsd4.10) libcurl/7.11.0 OpenSSL/0.9.7d zlib/1.1.4<br />
curl/7.11.0 (i386-portbld-freebsd4.9) libcurl/7.11.0 OpenSSL/0.9.7c zlib/1.1.4<br />
curl/7.11.0 (i586-mandrake-linux-gnu) libcurl/7.11.0 OpenSSL/0.9.7c zlib/1.2.1<br />
curl/7.11.0 (i686-pc-linux-gnu) libcurl/7.11.0 OpenSSL/0.9.7a zlib/1.1.4<br />
curl/7.11.0 (i686-pc-linux-gnu) libcurl/7.11.0 OpenSSL/0.9.7d ipv6 zlib/1.2.1<br />
curl/7.11.0 (i686-suse-linux) libcurl/7.11.0 OpenSSL/0.9.7d ipv6 zlib/1.2.1<br />
curl/7.11.1 (i386-redhat-linux-gnu) libcurl/7.11.1 OpenSSL/0.9.7a ipv6 zlib/1.1.4<br />
curl/7.11.1 (i386-redhat-linux-gnu) libcurl/7.11.1 OpenSSL/0.9.7a ipv6 zlib/1.2.1.2<br />
curl/7.11.1 (i686-pc-linux-gnu) libcurl/7.11.1 OpenSSL/0.9.7a zlib/1.1.4<br />
curl/7.11.1 (powerpc-apple-darwin7.5.0) libcurl/7.11.1 OpenSSL/0.9.7d ipv6 zlib/1.1.4<br />
curl/7.11.1 (powerpc-apple-darwin7.7.0) libcurl/7.11.1 OpenSSL/0.9.7d ipv6 zlib/1.1.4<br />
curl/7.11.2 (i686-pc-linux-gnu) libcurl/7.10.2 OpenSSL/0.9.6i ipv6 zlib/1.1.4<br />
curl/7.12.0 (i686-pc-linux-gnu) libcurl/7.12.0 OpenSSL/0.9.7a zlib/1.1.4<br />
curl/7.12.0 (i686-pc-linux-gnu) libcurl/7.12.0 OpenSSL/0.9.7d ipv6 zlib/1.2.1<br />
curl/7.12.0 (i686-pc-linux-gnu) libcurl/7.12.0 OpenSSL/0.9.7d ipv6 zlib/1.2.2<br />
curl/7.12.0 (i686-pc-linux-gnu) libcurl/7.12.0 OpenSSL/0.9.7e ipv6 zlib/1.2.2<br />
curl/7.12.1 (i386-redhat-linux-gnu) libcurl/7.12.1 OpenSSL/0.9.7a zlib/1.2.1.2 libidn/0.5.6<br />
curl/7.12.1 (i686-pc-linux-gnu) libcurl/7.12.1 OpenSSL/0.9.7d zlib/1.2.1<br />
curl/7.12.1 (i686-redhat-linux-gnu) libcurl/7.12.1 OpenSSL/0.9.7a zlib/1.2.1.2 libidn/0.5.6<br />
curl/7.12.2 (i386-pc-linux-gnu) libcurl/7.12.2 OpenSSL/0.9.7e zlib/1.2.2 libidn/0.5.2<br />
curl/7.12.2 (i386-pc-win32) libcurl/7.12.2 zlib/1.2.1<br />
curl/7.12.3 (i386-redhat-linux-gnu) libcurl/7.12.3 OpenSSL/0.9.7a zlib/1.2.1.2 libidn/0.5.6<br />
curl/7.13.0 (i386-pc-linux-gnu) libcurl/7.13.0 OpenSSL/0.9.7e zlib/1.2.2 libidn/0.5.2<br />
curl/7.13.1 (i386-pc-linux-gnu) libcurl/7.13.1 OpenSSL/0.9.7e zlib/1.2.2 libidn/0.5.2<br />
curl/7.13.1 (i386-portbld-freebsd4.10) libcurl/7.13.1 OpenSSL/0.9.7d zlib/1.1.4<br />
curl/7.13.1 (i386-portbld-freebsd5.3) libcurl/7.13.1 OpenSSL/0.9.7g zlib/1.2.1<br />
curl/7.13.1 (i386-portbld-freebsd5.4) libcurl/7.13.1 OpenSSL/0.9.7e zlib/1.2.1<br />
curl/7.13.1 (i386-redhat-linux-gnu) libcurl/7.13.1 OpenSSL/0.9.7f zlib/1.2.2.2 libidn/0.5.15<br />
curl/7.13.1 (i686-pc-linux-gnu) libcurl/7.13.1 OpenSSL/0.9.7e zlib/1.2.2<br />
curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7b zlib/1.2.2<br />
curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7g zlib/1.2.3<br />
curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7i zlib/1.2.3<br />
curl/7.13.2 (i386-pc-linux-gnu) libcurl/7.13.2 OpenSSL/0.9.7e zlib/1.2.2 libidn/0.5.13<br />
curl/7.13.2 (i686-pc-linux-gnu) libcurl/7.13.2 OpenSSL/0.9.7e zlib/1.2.2 libidn/0.5.15<br />
curl/7.13.2 (i686-pc-linux-gnu) libcurl/7.13.2 OpenSSL/0.9.7e zlib/1.2.3 libidn/0.5.15<br />
curl/7.14.0 (i386-pc-linux-gnu) libcurl/7.14.0 OpenSSL/0.9.7e zlib/1.2.2 libidn/0.5.13<br />
curl/7.14.0 (i386-portbld-freebsd4.9) libcurl/7.14.0 OpenSSL/0.9.7b zlib/1.1.4<br />
curl/7.14.0 (i386-portbld-freebsd5.4) libcurl/7.14.0 OpenSSL/0.9.7e zlib/1.2.1<br />
curl/7.14.0 (i386-portbld-freebsd6.0) libcurl/7.14.0 OpenSSL/0.9.7e zlib/1.2.2<br />
curl/7.14.0 (i486-pc-linux-gnu) libcurl/7.14.0 OpenSSL/0.9.7g zlib/1.2.3 libidn/0.5.13<br />
curl/7.14.1 (i386-portbld-freebsd4.7) libcurl/7.14.1 OpenSSL/0.9.8 zlib/1.1.3<br />
curl/7.15.0 (powerpc64-unknown-linux-gnu) libcurl/7.15.0 OpenSSL/0.9.7e zlib/1.2.3<br />
curl/7.15.1 (i386-portbld-freebsd4.11) libcurl/7.15.1 OpenSSL/0.9.7d zlib/1.1.4<br />
curl/7.15.1 (i586-pc-mingw32msvc) libcurl/7.15.1 zlib/1.2.2<br />
curl/7.15.1 (i586-trustix-linux-gnu) libcurl/7.15.1 OpenSSL/0.9.7i zlib/1.2.3<br />
curl/7.15.1 (x86_64-pc-linux-gnu) libcurl/7.15.1 OpenSSL/0.9.7j zlib/1.2.3<br />
curl/7.15.1 (x86_64-pc-linux-gnu) libcurl/7.15.1 zlib/1.2.3<br />
curl/7.15.3 (i386-portbld-freebsd6.0) libcurl/7.15.3 OpenSSL/0.9.7e zlib/1.2.2<br />
curl/7.15.3 (i686-pc-linux-gnu) libcurl/7.15.3 OpenSSL/0.9.7a zlib/1.2.1.2 libidn/0.5.6<br />
curl/7.15.4 (i686-pc-linux-gnu) libcurl/7.15.4 OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.3<br />
curl/7.15.5 (i686-pc-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5<br />
curl/7.15.5 (i686-redhat-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8a zlib/1.2.3 libidn/0.6.2<br />
curl/7.7.2 (powerpc-apple-darwin6.0) libcurl 7.7.2 (OpenSSL 0.9.6b)<br />
curl/7.7.2 (powerpc-apple-darwin6.0) libcurl 7.7.2 (OpenSSL 0.9.6e) (ipv6 enabled)<br />
curl/7.8 (i386-redhat-linux-gnu) libcurl 7.8 (OpenSSL 0.9.6b) (ipv6 enabled)<br />
curl/7.9 (i386-unknown-freebsd4.2) libcurl 7.9 (OpenSSL 0.9.6i) (ipv6 enabled)<br />
curl/7.9.2 (i386-redhat-linux-gnu) libcurl/7.10.3 zlib/1.1.3<br />
curl/7.9.2 (i386-redhat-linux-gnu) libcurl/7.10.3 zlib/1.1.4<br />
curl/7.9.5 (i386-redhat-linux-gnu) libcurl 7.9.5 (OpenSSL 0.9.6b) (ipv6 enabled)<br />
curl/7.9.5 (i586-pc-linux-gnu) libcurl 7.9.5 (ipv6 enabled)<br />
curl/7.9.5 (i586-pc-linux-gnu) libcurl 7.9.5 (OpenSSL 0.9.6a)<br />
curl/7.9.5 (i586-pc-linux-gnu) libcurl/7.10.3 OpenSSL/0.9.6c ipv6 zlib/1.2.1<br />
curl/7.9.7 (i686-pc-linux-gnu) libcurl/7.11.1 OpenSSL/0.9.7a ipv6 zlib/1.1.4<br />
curl/7.9.8 (i386&#8211;freebsd4.6) libcurl 7.9.8 (OpenSSL 0.9.6e)<br />
curl/7.9.8 (i386-portbld-freebsd4.1) libcurl 7.9.8<br />
curl/7.9.8 (i386-portbld-freebsd4.7) libcurl 7.9.8 (OpenSSL 0.9.6g) (ipv6 enabled)<br />
curl/7.9.8 (i386-redhat-linux-gnu) libcurl 7.9.8 (OpenSSL 0.9.6b)<br />
curl/7.9.8 (i386-redhat-linux-gnu) libcurl 7.9.8 (OpenSSL 0.9.7a) (ipv6 enabled)<br />
curl/7.9.8 (i386-unknown-freebsd4.6.2) libcurl 7.9.8 (OpenSSL 0.9.6)<br />
curl/7.9.8 (i686-pc-linux-gnu) libcurl 7.9.8 (OpenSSL 0.9.6b) (ipv6 enabled)</p>
<p><strong>[*WebGrabber*]</strong><br />
Rippers<br />
WebGrabber is a utility that you can use to mirror, copy, synchronize, download, scrub or &#8220;steal&#8221; a web site.<br />
www.substancia.com WebGrabber (ver 1.0)</p>
<p><strong>[*grub-client*]</strong><br />
grub-client<br />
They claim to read/respect robots.txt. I have seen no personal evidence of that.</p>
<p>Update: March 23, 2004 a Grub client 1.07 (didn&#8217;t think that was an official version, s/b 1.0.7 ???) read robots.txt and then got caught up in my trap so it got no further.<br />
grub-client<br />
Mozilla/4.0 (compatible; grub-client-0.2.3; Crawl your stuff with http://grub.org)<br />
Mozilla/4.0 (compatible; grub-client-0.2.4; Crawl your own stuff with http://grub.org)<br />
Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)<br />
Mozilla/4.0 (compatible; grub-client-1.0.3; Crawl your own stuff with http://grub.org)<br />
Mozilla/4.0 (compatible; grub-client-1.0.4; Crawl your own stuff with http://grub.org)<br />
Mozilla/4.0 (compatible; grub-client-1.0.5; Crawl your own stuff with http://grub.org)<br />
Mozilla/4.0 (compatible; grub-client-1.0.5; sponsored by <a href=http://www.cutecandy.com>www.cutecandy.com</a> and grub.org)<br />
Mozilla/4.0 (compatible; grub-client-1.0.6; Crawl your own stuff with http://grub.org)<br />
Mozilla/4.0 (compatible; grub-client-1.0.7; Crawl your own stuff with http://grub.org)<br />
Mozilla/4.0 (compatible; grub-client-1.07; Crawl your own stuff with http://grub.org)<br />
Mozilla/4.0 (compatible; grub-client-1.1.1; Crawl your own stuff with http://grub.org)<br />
Mozilla/4.0 (compatible; grub-client-1.2.1; Crawl your own stuff with http://grub.org)<br />
Mozilla/4.0 (compatible; grub-client-1.3.1; Crawl your own stuff with http://grub.org)<br />
Mozilla/4.0 (compatible; grub-client-1.3.7; Crawl your own stuff with http://grub.org)<br />
Mozilla/4.0 (compatible; grub-client-1.4.3; Crawl your own stuff with http://grub.org)<br />
Mozilla/4.0 (compatible; grub-client-1.5.3; Crawl your own stuff with http://grub.org)<br />
Mozilla/4.0 (compatible; grub-client-2.3)<br />
Mozilla/4.0 (compatible; grub-client-2.6.0)<br />
Mozilla/4.0 (compatible; grub-client-2.6.1)</p>
<p><strong>[CAST]</strong><br />
CAST<br />
Cast Software sells a data-mining application that can also mine websites.</p>
<p><strong>[b2w/*]</strong><br />
b2w<br />
Almost knocked Webmaster World offline with 6K+ requests in under an hour.</p>
<p><strong>[JPluck/*]</strong><br />
JPluck<br />
I am ashamed of SourceForge for allowing such a badly behaved piece of software to be made available through their website. It does not read robots.txt.<br />
JPluck/2.0.9 (Java 1.4.2_03; Windows XP)<br />
JPluck/2.1.1 (Java 1.4.2_04; Linux)<br />
JPluck/2.1.6 (Java 1.4.2_04; Linux)<br />
JPluck/2.1.6 (Java 1.4.2_04; Windows 2000)<br />
JPluck/2.1.6b (Java 1.4.2; Linux)<br />
JPluck/2.1.6b (Java 1.4.2_03; Windows XP)<br />
JPluck/2.1.6b (Java 1.4.2_04; Windows 2000)</p>
<p><strong>[Kapere (http://www.kapere.com)]</strong><br />
Kapere<br />
This is a download accelerator and &#8220;website grabber&#8221; (their terminology not mine) that does not read robots.txt.</p>
<p><strong>[ezic.com http agent *]</strong><br />
Ezic.com<br />
IP resolves to NetBilling, Inc. but I have no idea what they&#8217;re doing. There is also a website at www.ezic.com but again I have no idea what they might be up to.</p>
<p><strong>[LeechGet*]</strong><br />
LeechGet<br />
Download manager.<br />
LeechGet (www.leechget.net)<br />
LeechGet 2002 (www.leechget.de)<br />
LeechGet 2003 (www.leechget.net)<br />
LeechGet 2004 (www.leechget.net)<br />
LeechGet 2005 (www.leechget.net)</p>
<p><strong>[Mozilla/3.0 (compatible; Indy Library)]</strong><br />
Rippers<br />
This appears to be another automated agent checking my downloads.asp page instead of version.asp as it should be according to my TOS.</p>
<p>Part of a Delphi/C++ builder suite of tools for doing internet stuff. The second Link is where I found out about this potentially nasty little bot.<br />
Mozilla/3.0 (compatible; Indy Library)</p>
<p><strong>[MovableType/*]</strong><br />
MovableType Web Log<br />
Why is someone&#8217;s blog reading the downloads page on my personal website? The answer doesn&#8217;t really matter as it didn&#8217;t read robots.txt first so it&#8217;s banned.<br />
MovableType/2.51<br />
MovableType/2.63<br />
MovableType/2.64</p>
<p><strong>[fetch libfetch/*]</strong><br />
Rippers<br />
IP belongs to Australian Academic and Research Network. No indication of exactly what this ua is.<br />
fetch libfetch/2.0</p>
<p><strong>[ScoutAbout*]</strong><br />
ScoutAbout<br />
Word processor that permits Internet serach and rip.</p>
<p><strong>[Python*]</strong><br />
Python<br />
Python is an interpreted, interactive, object-oriented programming language. It is often compared to Tcl, Perl, Scheme or Java. It does not check robots.txt.<br />
Python RobotFileParser/2.0<br />
Python-urllib/1.15<br />
Python-urllib/1.16<br />
Python-urllib/2.0a1<br />
Python-urllib/2.1<br />
Python-urllib/2.4</p>
<p><strong>[pavuk/*]</strong><br />
Pavuk<br />
From their website: Pavuk is UNIX program used to mirror contents of WWW documents or files. It transfers documents from HTTP, FTP, Gopher and optionaly from HTTPS (HTTP over SSL) servers. Pavuk has an optional GUI based on GTK2 widget set.<br />
pavuk/0.9pl27 powerpc-apple-darwin1.3</p>
<p><strong>[PEAR HTTP_Request*]</strong><br />
PEAR-PHP<br />
Full user agent: PEAR HTTP_Request class<br />
PEAR HTTP_Request class ( http://pear.php.net/ )</p>
<p><strong>[WebSauger*]</strong><br />
WebSauger<br />
German version of HTTrack, a website copier.<br />
WebSauger 1.20b</p>
<p><strong>[SOFTWING_TEAR_AGENT*]</strong><br />
AspTear<br />
AspTear from AlphaSierraPapa, Inc. Rips pages from remote websites.<br />
SOFTWING_TEAR_AGENT<br />
SOFTWING_TEAR_AGENT_1_0</p>
<p><strong>[SuperHTTP/*]</strong><br />
SuperHTTP<br />
Discontinued according to the vendor but still available via other sites.<br />
SuperHTTP/1.0</p>
<p><strong>[IrssiUrlLog/*]</strong><br />
IrssiUrlLog<br />
Irssi-based URL grabber written by Thomas Graf. The URL listed for his website, http://irssi.reeler.org/url/ and the base URL both seem to be offline. DNSReports shows the domain is also blacklisted by DNSBL.SORBS.NET as of August 21, 2005.<br />
IrssiUrlLog/0.2</p>
<p><strong>[ActiveRefresh*]</strong><br />
ActiveRefresh<br />
From their website: ActiveRefresh is a web news monitor which will save you browsing time by monitoring information sources, gathering all of the information that you regularly access and presenting it in comfortable and adjustable way. My comments: It does read robots.txt and I think it should. That makes it a Website Stripper!</p>
<h2>SeaMonkey 1.0</h2>
<p><strong>[SeaMonkey 1.0]</strong><br />
SeaMonkey<br />
The SeaMonkey project is a community effort to deliver production-quality releases of code derived from the application formerly known as &#8220;Mozilla Application Suite&#8221; which has been discontinued.<br />
http://www.mozilla.org/projects/seamonkey/</p>
<h2>Seamonkey 1.1</h2>
<p><strong>[SeaMonkey 1.1]</strong><br />
Seamonkey<br />
The SeaMonkey project is a community effort to deliver production-quality releases of code derived from the application formerly known as &#8220;Mozilla Application Suite&#8221; which has been discontinued.</p>
<h2>Search Engines</h2>
<p><strong>[Mozilla/0.9* no dos :) (Linux)]</strong><br />
goliat<br />
Hungarian search engine. Did not read robots.txt.<br />
http://www.goliat.hu/</p>
<p><strong>[BigCliqueBOT/*]</strong><br />
BigClique.com/BigClic.com<br />
Seems to be well-behaved.<br />
BigCliqueBOT/1.03-dev (bigclicbot; http://www.bigclique.com; bot@bigclique.com)</p>
<p><strong>[GurujiBot/1.*]</strong><br />
GurujiBot<br />
Indian search engine based in Santa Clara, CA. Did not request robots.txt.</p>
<p><strong>[iSEEKbot/*]</strong><br />
iSEEKbot<br />
Search engine under development. It&#8217;s one of those bots that reads (and obeys so far) robots.txt before every single page. Quite an annoyance and hardly necessary.<br />
http://www.iseek.com/<br />
http://beta.iseek.com/iseekbot.html</p>
<p><strong>[Mozilla/5.0 (compatible; OsO;*]</strong><br />
Octopodus<br />
Yet another search engine under development. It would be nice if it read robots.txt. It would also be nice if it had a valid UA:<br />
Mozilla/5.0 (compatible; OsO; http://oso.octopodus.com/abot.html<br />
is not a valid UA. Replace the oso sub-domain with www and you&#8217;ll see this UA is actually from a translation service called Silurus.</p>
<p><strong>[Mozilla/5.0 (*) VoilaBot BETA 1.*]</strong><br />
VoilaBot<br />
French SE. Only read robots.txt after several minutes of taking pages.</p>
<p><strong>[Abacho*]</strong><br />
Abacho<br />
Related to Eule-Robot<br />
ABACHOBot</p>
<p><strong>[Pompos/*]</strong><br />
Pompos<br />
Pompos is the spider used by dir.com, a French search engine.<br />
Pompos/1.2 http://pompos.iliad.fr<br />
Pompos/1.3 http://dir.com/pompos.html</p>
<p><strong>[StackRambler/*]</strong><br />
StackRambler<br />
Russian search engine.<br />
StackRambler/2.0 (MSIE incompatible)</p>
<p><strong>[Szukacz/*]</strong><br />
Szukacz<br />
From their website: Szukacz specializes in searching for documents prepared in the Polish language. However, Szuakcz also runs searches against &#8220;The Best of the World&#8221; collection (&#8221;Swiat&#8221; in Polish).<br />
Szukacz/1.5 (robot; www.szukacz.pl/html/jak_dziala_robot.html; info@szukacz.pl)<br />
Szukacz/1.5 (robot; www.szukacz.pl/jakdzialarobot.html; info@szukacz.pl)</p>
<p><strong>[Pagebull http://www.pagebull.com/]</strong><br />
Pagebull<br />
Displays search results as thumbnails of page.<br />
http://www.pagebull.com/</p>
<p><strong>[wadaino.jp-crawler*]</strong><br />
wadaino.jp<br />
No robots.txt.</p>
<p><strong>[ALeadSoftbot/*]</strong><br />
ALeadSoftbot<br />
From their website: ALeadSoftbot is ALeadSoft&#8217;s web-crawling robot. It collects documents from the web to build a searchable index to build site search engine.<br />
http://www.aleadsoft.com/bot.htm</p>
<p><strong>[Sqeobot/0.*]</strong><br />
Branzel<br />
No robots.txt.<br />
http://seequer.com/</p>
<p><strong>[HyperEstraier/*]</strong><br />
HyperEstraier<br />
Does not request robots.txt. From their website: a full-text search system for communities.</p>
<p><strong>[Mozilla/5.0 (?http://www.eurekster.com/mammoth) Mammoth/0.*]</strong><br />
Search Party<br />
No robots.</p>
<p><strong>[VisBot/2.* (Visvo.com Crawler; *)]</strong><br />
Visvo<br />
Thanks Jonathan.</p>
<p><strong>[Mozilla/5.0 (compatible; ActiveTouristBot*; http://www.activetourist.com)]</strong><br />
ActiveTouristBot<br />
I personally worked with the owner of this bot to help him fine-tune its behavior. I&#8217;m satisfied now that it&#8217;s a well-behaved bot.<br />
Mozilla/5.0 (compatible; ActiveTouristBot V1.4 ; http://www.activetourist.com)</p>
<p><strong>[Mozilla/5.0 (compatible; Charlotte/1.0b; *)]</strong><br />
Charlotte<br />
Reads but does not respect robots.txt<br />
Mozilla/5.0 (compatible; Charlotte/1.0b; charlotte@betaspider.com)</p>
<p><strong>[WebAlta Crawler/*]</strong><br />
WebAlta Crawler<br />
Did not request robots.txt. A Firefox user agent from the same IP Address checks a page to see what happens and if it&#8217;s same then the SE bot gets the same page.<br />
WebAlta Crawler/1.3.18 (http://www.webalta.net/ru/about_webmaster.html) (Windows; U; Windows NT 5.1; ru-RU)</p>
<p><strong>[Fooky.com/ScorpionBot/ScoutOut;*]</strong><br />
ScorpionBot<br />
Did not read robots.txt.<br />
Fooky.com/ScorpionBot/ScoutOut; http://www.fooky.com/scorpionbots</p>
<p><strong>[WISEbot/*]</strong><br />
WISEbot<br />
Did not read robots.txt.<br />
WISEbot/1.0 (WISEbot@koreawisenut.com; http://wisebot.koreawisenut.com)</p>
<p><strong>[antibot-V*]</strong><br />
antibot<br />
According to WebmasterWorld this is a French search engine that hasn&#8217;t crawled since around 2002 but is back again. I could not a URL for it. It did read and respect robots.txt.<br />
antibot-V1.3.3.5/debian-sarge-pentium3</p>
<p><strong>[RedCell/* (*)]</strong><br />
RedCell<br />
From their website: I am trying to compile a security based search engine that should be pretty inclusive yet have no corporate sponsorship nor any type of advertisement or commericials.<br />
RedCell/0.1 (RedCell; telegenetic.net/bot.html; lhall_at_telegenetic.net)</p>
<p><strong>[Norbert the Spider(Burf.com)]</strong><br />
Norbert the Spider<br />
Did not read robots.txt.<br />
Norbert the Spider(Burf.com)</p>
<p><strong>[TerrawizBot/*]</strong><br />
TerrawizBot<br />
Reads but does obey robots.txt.<br />
TerrawizBot/1.0 ( http://www.terrawiz.com/bot.html)</p>
<p><strong>[zibber-v*]</strong><br />
Zibb<br />
Business-related search engine.<br />
zibber-v0.1(www.zibb.com/crawler/)</p>
<p><strong>[LocalcomBot/*]</strong><br />
LocalcomBot<br />
Beta search engine. Claims to index local websites. What the heck are local websites?<br />
LocalcomBot/1.2 ( http://www.local.com/bot.htm)<br />
LocalcomBot/1.2.2 ( http://www.local.com/bot.htm)</p>
<p><strong>[GOFORITBOT (?http://www.goforit.com/about/?)]</strong><br />
GoForIt<br />
Is this somehow related to GoDaddy? The IP Address is registered to GoDaddy. The user agent starts with the word GO. From their website: GoForIt is an Internet guide &#8230; designed to make it easier for you to search the Internet and quickly find the information you are looking for. GoForIt combines the efficiency of our world-class meta-search engine with the largest and most comprehensive human-edited directory of the World Wide Web.<br />
GOFORITBOT ( http://www.goforit.com/about/ )</p>
<p><strong>[NavissoBot]</strong><br />
NavissoBot<br />
From their website: Navisso, a free search engine with the goal of becoming one of the most relevant and dependable search engines on the web.<br />
NavissoBot</p>
<p><strong>[YadowsCrawler*]</strong><br />
YadowsCrawler<br />
Search engine is still under development. I am told it respects robots.txt.</p>
<p><strong>[KRetrieve/]</strong><br />
KRetrieve<br />
It took numerous disallowed files before it even read robots.txt. And then it continued to take disallowed files.<br />
KRetrieve/1.1/dbsearchexpert.com</p>
<p><strong>[webcrawl.net]</strong><br />
webcrawl.net<br />
Search engine that includes a directory based on ODP data.<br />
webcrawl.net</p>
<p><strong>[Amfibibot/*]</strong><br />
Amfibi<br />
Reads and respects robots.txt although it only read robots.txt and my default page.<br />
Amfibibot/0.06 (Amfibi Robot; http://www.amfibi.com; agent@amfibi.com)<br />
Amfibibot/0.06 (Amfibi Web Search; http://www.amfibi.com; agent@amfibi.com)<br />
Amfibibot/0.07 (Amfibi Robot; http://www.amfibi.com; agent@amfibi.com)</p>
<p><strong>[FyberSpider*]</strong><br />
FyberSpider<br />
The folks at WebmasterWorld seem to like this spider. I disagree. It doesn&#8217;t read robots.txt. As of 22 Feb 2007 It no longer provides a URL in the user agent.<br />
http://www.fybersearch.com/<br />
FyberSpider ( http://www.fybersearch.com/fyberspider.php)</p>
<p><strong>[Mozilla/5.0 (compatible; NLCrawler/*]</strong><br />
Northern Light Web Search<br />
Did not read robots.txt.<br />
Mozilla/5.0 (compatible; NLCrawler/2.0.15; Linux 2.6.3-7; i686; en_US)KHTML/3.4.89 (like Gecko)</p>
<p><strong>[Mozilla/4.0(?compatible; MSIE 6.0; Qihoo *)]</strong><br />
Qihoo<br />
Very popular Chinese BBS and blog search engine.<br />
Mozilla/4.0( compatible; MSIE 6.0; Qihoo 0.9.6 )</p>
<p><strong>[miniRank/*]</strong><br />
miniRank<br />
You type in a domain and see a page that tells where you rank in their database. Big whoop.<br />
miniRank/2.0 (miniRank; http://minirank.com/; website ranking engine)</p>
<p><strong>[ASPSeek/*]</strong><br />
ASPSeek<br />
From their website: ASPseek is an Internet search engine software developed by SWsoft and licensed as free software under GNU GPL.<br />
ASPseek/1.2.10<br />
ASPseek/1.2.12<br />
ASPSeek/1.2.5</p>
<p><strong>[Mozilla/5.0 (compatible; CXL-FatAssANT (El Robeiro); http://www.conexcol.com/FatAssANT/; ANTid:alfa; v. 0.5.1)]</strong><br />
Conexcol.com<br />
Columbian search engine.<br />
Mozilla/5.0 (compatible; CXL-FatAssANT (El Robeiro); http://www.conexcol.com/FatAssANT/; ANTid:alfa; v. 0.5.1)</p>
<p><strong>[PEERbot*]</strong><br />
PEERbot<br />
It&#8217;s like a search engine and directory rolled-up into one.<br />
PEERbot www.peerbot.com</p>
<p><strong>[*Fluffy the spider*]</strong><br />
SearchHippo<br />
Their own web page about their spider makes no mention at all about robots.txt and my analyzer shows it never even requested it when it visited my site.<br />
Mozilla/3.0 (compatible; Fluffy the spider; http://www.searchhippo.com/; info@searchhippo.com)</p>
<p><strong>[Filangy/*]</strong><br />
Filangy<br />
From their site: Filangy&#8217;s Patent Pending ActiveWeb search technology allows you to search pages that are most frequently accessed and offer up-to-date, useful information.<br />
Filangy/1.01 (Filangy; http://www.filangy.com/filangyinfo.jsp?inc=robots.jsp; filangy-agent@filangy.com)</p>
<p><strong>[MaSagool/*]</strong><br />
Sagoo<br />
Well behaved crawler for Sagoo, a Japanese search engine.<br />
MaSagool/1.0 (MaSagool; http://sagool.jp/; masagool@sagool.jp)</p>
<p><strong>[SquigglebotBot/*]</strong><br />
SquigglebotBot<br />
Brought to you by TheCyberWeb. Ugh!<br />
SquigglebotBot/1.0 http://squigglebot.com</p>
<p><strong>[Deepindex]</strong><br />
Deepindex<br />
French search engine.<br />
Deepindex</p>
<p><strong>[LapozzBot/*]</strong><br />
LapozzBot<br />
Hungarian search engine.<br />
LapozzBot/1.3 ( http://robot.lapozz.com)<br />
LapozzBot/1.3 ( http://robot.lapozz.hu)<br />
LapozzBot/1.4 ( http://robot.lapozz.com)<br />
LapozzBot/1.4 ( http://robot.lapozz.hu)</p>
<p><strong>[Kolinka Forum Search (www.kolinka.com)]</strong><br />
Kolinka Forum Search<br />
It does not read robots.txt. From their website: At Project Kolinka we are developing a new way to search community driven web forums and message boards. The Kolinka spider only crawls forum content, requesting pages at a moderate rate of 1 page per second.</p>
<p><strong>[Qweery*]</strong><br />
QweeryBot<br />
Qweerybot is Qweery&#8217;s web-crawling robot. It collects documents from the web (mainly dutch) to build a searchable index for the Qweery search engine (in development).<br />
QweeryBot v2.05 &#8212; http://qweerybot.qweery.com &#8212; check robots.txt</p>
<p><strong>[Tkensaku/*]</strong><br />
Tkensaku<br />
Japanese search engine.</p>
<p><strong>[Searchmee! Spider*]</strong><br />
Searchmee!<br />
It read robots.txt but I don&#8217;t know yet if it respects it. From their website: This is a general purpose search engine. The goal is to produce the highest quality search results based on concensus of other web sites.<br />
Searchmee! Spider v0.98a</p>
<p><strong>[InfociousBot (?http://corp.infocious.com/tech_crawler.php)]</strong><br />
InfociousBot<br />
I saw a report about this crawler. The report stated this bot reads robots.txt but does not respect it. I&#8217;m currently testing that assertion myself.</p>
<p><strong>[*FDSE robot*]</strong><br />
FDSE Robot<br />
Fluid Dynamics Search Engine robot. Crawls remote sites as part of a shareware search engine program.<br />
Mozilla/4.0 (compatible: FDSE robot)</p>
<p><strong>[Mozilla/4.0 (compatible; MSIE *; Windows NT; Girafabot; girafabot at girafa dot com; http://www.girafa.com)]</strong><br />
Girafabot<br />
It read robots.txt but did not respect the defined exclusions.<br />
Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; Girafabot; girafabot at girafa dot com; http://www.girafa.com)<br />
Mozilla/4.0 (compatible; MSIE 5.5; Windows NT; Girafabot; girafabot at girafa dot com; http://www.girafa.com)</p>
<p><strong>[Hotzonu/*]</strong><br />
Hotzonu<br />
Sketchy information at best leads me to believe this is a crawler associated with MegaBBS and somehow related to Infoseek in Japan. It does not read robots.txt.<br />
Hotzonu/2.0</p>
<p><strong>[Tarantula/*]</strong><br />
Tarantula<br />
The bot read and respected robots.txt. But only after it started its crawl. The UA has no URL in it. Only an e-mail address. That&#8217;s enough for me to ban it.<br />
Tarantula/0.1 (compatible; Mozilla 4.0; MSIE 5.5; mark_taran@speedymail.org)</p>
<h2>SEMC Browser</h2>
<p><strong>[SEMC Browser]</strong><br />
SEMC Browser<br />
From their website: The SonyEricsson SEMC browser is a XHTML-capable browser. I can find no hint of what this browser supports other than some very simple CSS. So I&#8217;m hesitant to say it does anything more than support tables.</p>
<h2>Sensis</h2>
<p><strong>[Sensis Web Crawler (search_comments\at\sensis\dot\com\dot\au)]</strong><br />
Sensis Web Crawler<br />
Australian web crawler. Read and respected robots.txt. No URL to follow up on in the user agent.<br />
Sensis Web Crawler (search_comments\at\sensis\dot\com\dot\au)</p>
<h2>Shiira</h2>
<p><strong>[Shiira]</strong><br />
Shiira<br />
Shiira is a web browser for the Mac. It is based on Web Kit and written in Cocoa.</p>
<h2>Site Monitors</h2>
<p><strong>[Site Valet Online*]</strong><br />
Site Valet<br />
From their website: Site Valet <strong>[is]</strong> a deluxe website monitoring service that integrates automated reporting with online tools.</p>
<p><strong>[Webcheck *]</strong><br />
Webcheck<br />
From their website: WebCheck web server monitoring software is the best and most comprehensive web site monitoring and web server monitoring software around.<br />
Webcheck 1.0</p>
<p><strong>[UpTime Checker*]</strong><br />
UpTime Checker<br />
From the Blue Data Networks website: Uptime Checker is different from other monitoring tools as it checks the uptime on a device using SNMP. Uptime Checker then uses the uptime to determine if a device has just restarted.</p>
<p><strong>[WebPatrol/*]</strong><br />
WebPatrol<br />
From their website: WebPatrol visits your web site every 5 to 60 minutes, depending on account type, and checks for a pre-defined keyword in your HTML, such as &#8220;Acme&#8221;, or &#8220;Widgets&#8221;. If your web site is unreachable, returns an error message, or does not return your pre-specified keyword, WebPatrol can be configured to: send email messages or send numeric or alphanumeric pages.<br />
WebPatrol/2.0</p>
<p><strong>[*EasyRider*]</strong><br />
EasyRider<br />
From their website: Easyrider LAN Pro owns and operates Vigilance Monitoring, a professionally staffed, proactive computer server, applications and network monitoring company. We can use sniffers and other tools to audit sites, measure traffic patterns, identify bottlenecks and recommend improvements. We can use our professional monitoring tools to baseline your equipment and do capacity planning for future growth.<br />
Mozilla/4.0 (compatible; MSIE 3.02; ARM; EasyRider/1.4.1-GR-BR-YT; IA-PAL)<br />
Mozilla/4.0 (compatible; MSIE 3.02; ARM; EasyRider/1.4.2-GR-BR-YT; IA-PAL)</p>
<p><strong>[Net Probe]</strong><br />
Net Probe<br />
From their website: Net Probe is a site list maintenance utility. It scans for FTP and web sites, and checks them to see if they are alive or have changed since the last time the site was checked.<br />
Net Probe</p>
<p><strong>[*Netcraft Webserver Survey*]</strong><br />
Netcraft Webserver Survey<br />
Allows you to determine what operating system and web server a given site is running. Also gives uptimes for most sites. Well respected site even though it ignores robots.txt.<br />
Mozilla/4.0 (compatible; Netcraft Webserver Survey)</p>
<h2>Snap</h2>
<p><strong>[Snap]</strong><br />
Snap<br />
Snap appears to be an image crawler. All of its bots ignore robots.txt<br />
http://www.snap.com</p>
<h2>Social Bookmarkers</h2>
<p><strong>[WinkBot/*]</strong><br />
WinkBot<br />
A a bookmark service with a pretty nice search feature.<br />
http://www.wink.com<br />
WinkBot/0.06 (Wink.com search engine web crawler; http://www.wink.com/Wink:WinkBot; winkbot@wink.com)</p>
<h2>Sogou</h2>
<p><strong>[Sogou]</strong><br />
Sogou<br />
Chinese search engine. Does not read robots.txt.<br />
http://www.sogou.com/</p>
<h2>The Planet</h2>
<p><strong>[The Planet]</strong><br />
The Planet&#8217;s Vulnerability Scanning<br />
These are the user agents used by ThePlanet.com when doing Vulnerability Scanning at a customer&#8217;s request.</p>
<h2>Translators</h2>
<p><strong>[Seram Server]</strong><br />
Seram Server<br />
Server-based translation service from Sunda Systems Oy in Finland.<br />
Seram Server</p>
<p><strong>[WebIndexer/* (Web Indexer; *)]</strong><br />
WorldLingo<br />
Free text and URL machine translations. Paid human translations.<br />
WebIndexer/1-dev (Web Indexer; mailto://webindexerv1@yahoo.com; webindexerv1@yahoo.com)</p>
<p><strong>[WebTrans]</strong><br />
WebTrans<br />
I think this is the right website for this user agent. I am awaiting e-mail confirmation from info@webtrans.de.<br />
WebTrans</p>
<p><strong>[ATA-Translation-Service]</strong><br />
ATA-Translation-Service<br />
Looks like an online translation tool similar to Babelfish. Possibly related to www.atanet.org/.<br />
ATA-Translation-Service</p>
<h2>Version Checkers</h2>
<p><strong>[GJK_Browser_Check]</strong><br />
GJK_Browser_Check<br />
This is the user agent checker from my website.<br />
GJK_Browser_Check</p>
<p><strong>[Browser Capabilities Project (http://browsers.garykeith.com; http://browsers.garykeith.com/sitemail/contact-me.asp)]</strong><br />
Gary Keith&#8217;s Version Checker<br />
This is the user agent I use when I&#8217;m checking for updates for files like the ones at iplists.com and hpc-factor.</p>
<p><strong>[Code Sample Web Client]</strong><br />
Code Sample Web Client<br />
Related to Browscap updater. From The Netherlands. This is one of my forum members testing a new application.</p>
<p><strong>[Subtext Version 1.9* - http://subtextproject.com/ (Microsoft Windows NT 5.2.*)]</strong><br />
Subtext<br />
I&#8217;m not sure what this blogging engine is doing. I figured it would be some sort of browscap updating tool but so far it&#8217;s only requested pages that don&#8217;t exist.<br />
http://subtextproject.com/</p>
<p><strong>[browsers.garykeith.com browscap.ini bot BETA]</strong><br />
Version Checkers<br />
This is being used by someone at Clarkson University in Potsdam, NY USA. I wish the person using it would contact me so I know who it is. I don&#8217;t have a problem with it. I just think it&#8217;s polite to include a URL and a contact e-mail in user agents and I&#8217;m disappointed they aren&#8217;t doing this.<br />
browsers.garykeith.com browscap.ini bot BETA</p>
<p><strong>[BrowscapUpd