-
-
Just after my previous cleanup, now i got much worse virus on my PC its called Worm.Win32.Mefir [a-z] by both Norton Antivirus (Symantec) & Avast (Alwil Software) NOD32 identified it as Win32/Virut.NAT
At the time being It infected *.html & *.php files and probably all text/html types. There is no cure yet. I hated this worms I’d lost few project because of this. Try archive (LZMA) all your web projects before hand. Its spreading like wild fire.
I havent try cleaning the infected files with Trend HouseCall Online Scans. Just hope there is cure for this worm. damn damn
-
- November 11, 2007 at 11:20 pm
- November 19, 2007 at 4:47 pm
- 0.3
- url
-
-
-
I just download google pack with norton and the first scan hook my fav svn tortoise with w32.virut.w .
Excerpt from Symantec
W32.Virut.A is a virus that infects executable files and opens a back door on TCP port 65520 by connecting to a predefined IRC server.
Netstats
netstat -aob > netstat.log
TCP USER:1028 78.109.19.140.in.hosting.ua:65520 ESTABLISHED 936 [winlogon.exe]
The free version of Norton Internet Scan Failed to fixed the virus. :(
Read the rest of this entry -
- November 11, 2007 at 11:44 am
- November 17, 2007 at 1:36 am
- 0.3
- url
-
-
-
Provided courtesy of http://browsers.garykeith.com
Created October 28, 2007 at 11:35:15 PM GMTwikipedia: User Agent
Accoona
[Accoona-AI-Agent/* (crawler at accoona dot com)]
Accoona-AI-Agent
UA as of November 2005.
Accoona-AI-Agent/1.1.1 (crawler at accoona dot com)Ad Brokers
[Ad Brokers]
Ad Brokers
This is for minor ad brokers. The major ad-related SE UAs like Google are in their respective parents.Amazon.com
[Intelix/*]
Intelix
Crawling from Amazon.com IP Address 72.44.63.10
http://www.microton.cz/intelix/Become
[MonkeyCrawl/*]
MonkeyCrawl
BitTorrent search engine from Exava.
MonkeyCrawl/0.05 (MonkeyCrawl; http://www.monkeymethods.org; )Best of the Web
[Mozilla/4.0 (compatible; BOTW Spider; *http://botw.org)]
BOTW Spider
Does not request robots.txt like all BOTW bots.Blue Coat Systems
[Blue Coat Systems]
Blue Coat Systems
Content filtering products.
http://www.cerberian.com/Copyright/Plagiarism
[IPiumBot laurion(dot)com]
IPiumBot
I see this as a French Plagerism bot, looking for copyright infringements and such. Did not read robots.txt.
IPiumBot laurion(dot)com[TurnitinBot/*]
TurnitinBot
Greedy little bastard related to SlySearch and uses the same IP. Does not always respect robots.txt.
TurnitinBot/1.5 (http://www.turnitin.com/robot/crawlerinfo.html)
TurnitinBot/1.5 http://www.turnitin.com/robot/crawlerinfo.html
TurnitinBot/2.0 (http://www.turnitin.com/robot/crawlerinfo.html)
TurnitinBot/2.0 http://www.turnitin.com/robot/crawlerinfo.html[oBot]
oBot
obot is a spider sent out by a company in Germany called ONLY Solutions. They scan the web looking for sites that infringe on copyrights and logos of their clients.
oBot[SlySearch/*]
SlySearch
Works in conjunction with Plagiarism.org and TurnItIn.comDanger
[Danger]
Danger
Danger is actually a service currently being used by a number of mobile devices including mostly notablly for T-Mobile’s Sidekick. It’s my understanding the AvantGo browser is an unnamed part of the package.
http://www.danger.com/index.phpDillo
[Dillo]
Dillo
Dillo is a small web browser written in C.Directories
[acontbot]
acontbot
German search engine.[Poirot]
Poirot
URL returns a blank page. No markup at all. IP Address is owned by ThePlanet. No robots.txt. Made second request using LWP::Simple/5.803.
Poirot[Mackster (*)]
Mackster
Mackster seems to feed a few search engines.[Misterbot]
Misterbot
French directory/search engine.
Misterbot[Findexa Crawler (http://www.findexa.no/gulesider/article26548.ece)]
Findexa Crawler
From their website: Findexa is the leading directory publisher in Norway, and the official publisher of telecommunication company Telenor. Findexa publishes four directory brands in Norway - BizKit, Telefonkatalogen, Gule Sider and Ditt Distrikt.
Findexa Crawler (http://www.findexa.no/gulesider/article26548.ece)[Mozilla/5.0 (Votay bot/*)]
Votay
No robots.txt. Directory with user voting to determine site ranking.
Mozilla/5.0 (Votay bot/4.0; http://www.votay.com/arts/comics/)
Mozilla/5.0 (Votay bot/4.0; http://www.votay.com/business/employment/)
Mozilla/5.0 (Votay bot/4.0; http://www.votay.com/recreation/models/)
Mozilla/5.0 (Votay bot/4.0; http://www.votay.com/shopping/recreation/)[aipbot/*]
aipbot
Reads robots.txt but it won’t index any of my sites for their directory.
aipbot/1.0 (aipbot; http://www.aipbot.com; aipbot@aipbot.com)
aipbot/2 (aipbot; http://www.aipbot.com; aipbot@aipbot.com)
aipbot/2-beta (aipbot dev; http://aipbot.com; aipbot@aipbot.com)[FirstGov.gov Search - POC:firstgov.webmasters@gsa.gov]
FirstGov.gov Search
AT&T/Fast Search robot for FirstGov (U.S.Government) portal
FirstGov.gov Search - POC:firstgov.webmasters@gsa.gov[Mozilla/5.0 (?http://www.toile.com/) ToileBot/*]
Toile
This appears to be a very popular French directory. Submissions are screened by humans and if accepted a bot comes around occasionally to validate links. They get secondary (backfill) results from Google.DNS Tools
[Domain Dossier utility*]
Domain Dossier
Free DNS tools. The IP Address for this bot was 70.84.211.98 which has a PTR of mail.webpal.info. If you go to www.webpal.info it’s the login page for someone’s website control panel.[DNSGroup/*]
DNS Group Crawler
Website in URL cannot be accessed. E-mail bounced (This address no longer accepts mail).
DNSGroup/0.1 (DNS Group Crawler; http://www.dnsgroup.com/; crawler@dnsgroup.com)Download Managers
[LMQueueBot/*]
LMQueueBot
On my site it did obey robots.txt but since it only read the index.asp page after that I can’t say conclusively if it obeys robots.txt. Regardless, on my sites all download managers are banned.
LMQueueBot/0.1
LMQueueBot/0.2[BitTorrent/*]
BitTorrent
P2P Client. Not sure why it’s browsing my websites.
BitTorrent/3.4.2[Vegas95/*]
Vegas95
Downloads.asp abuser from Japan
Vegas95/1.03 (WinNT; I)[Star*Downloader/*]
StarDownloader
From their website: Star Downloader is a download manager that accelerates your downloads by splitting the files into several parts and downloading them simultaneously. Download speeds are increased further by choosing the fastest mirror sites.
StarDownloader/1.44
StarDownloader/1.52[GetRightPro/*]
GetRightPro
This is a download manager. On my site it’s being used abusively to repeatedly download my files way too quickly.
GetRightPro/6.0a
GetRightPro/6.0b
GetRightPro/6.0beta7[AutoMate5]
AutoMate5
Part of an automation package from Network Automation that includes FTP downloads.
AutoMate5[shareaza*]
shareaza
From Wikipedia: Shareaza is a free Windows–based peer-to-peer client which supports the Gnutella, Gnutella2, EDonkey Network, BitTorrent, FTP and HTTP network protocols.
http://www.shareaza.com/[Xaldon WebSpider*]
Xaldon WebSpider
This is a product from Germany that is basically a download manager. It did not read robots.txt so it’s a website stripper.
Xaldon WebSpider 2.7.b6[Mozilla/4.0 (compatible; Getleft*)]
Getleft
From their website: So here is my little effort, it is supposed to download complete Web sites. You give it an URL, and down it goes on, happily downloading every linked URL in that site.
Mozilla/4.0 (compatible; Getleft 1.1.1)
Mozilla/4.0 (compatible; Getleft 1.1.2)
Mozilla/4.0 (compatible; Getleft 1.1b2)[Wget*]
Wget
GNU file downloader.
wget
wget libfetch/2.0
Wget/1.10
Wget/1.10-rc1
Wget/1.10.1
Wget/1.10.1 (Red Hat modified)
Wget/1.10.1-beta1
Wget/1.10.2
Wget/1.10.2 (Red Hat modified)
Wget/1.4.5
Wget/1.5.2
Wget/1.5.3
Wget/1.5.3.1
Wget/1.5.3gold
Wget/1.6
Wget/1.7
Wget/1.7.1
Wget/1.8
Wget/1.8.1
Wget/1.8.1 cvs
Wget/1.8.2
Wget/1.8.2 modified
Wget/1.9
Wget/1.9 cvs-dev
Wget/1.9 cvs-stable
Wget/1.9 cvs-stable (Red Hat modified)
Wget/1.9-beta
Wget/1.9-beta-unoff
Wget/1.9.1
Wget/1.9.1 WebWasher 3.3[Prozilla*]
Prozilla
From their website: ProZilla is a download accelerator for Linux which gives you a 200% to 300% improvement in your file downloading speeds.
Prozilla - Download accelerator for Linux1.3.6[NetPumper*]
NetPumper
From their website: It’s time to stop downloading and start pumping! NetPumper is a new Download Manager that makes downloading files from the Internet easier, faster and safer. Does not read robots.txt and downloads data at an incredibly fast rate of speed.
NetPumper Pro/0.1
NetPumper/1.02
NetPumper/1.03[Kontiki Client*]
Kontiki Client
Plain and simple it’s a download accelerator. According to PC Magazine, “Kontiki was by far the best program at accelerating transfers. Kontiki significantly speeded up most of our downloads while intruding little on to our test machine.”
Kontiki Client 1.0.20517.1
Kontiki Client 2.0.21031.0 (2a60753a-3587-a47b-6465-8afd59ed1808)
Kontiki Client 2.01.21211.2 (2a60753a-3587-a47b-6465-8afd59ed1808)
Kontiki Client 2.01.21211.2 (61fce3ba-8a4e-bf01-49f6-7109a56a08b0)
Kontiki Client 2.10.30418.1[Go!Zilla*]
GoZilla
This is made to look like a Go!Zilla clone but it’s really checking for formmail vulnerabilities.
Go!Zilla 3.3 (www.gozilla.com)
Go!Zilla 3.5 (www.gozilla.com)[BitBeamer/*]
BitBeamer
BitBeamer is a fully featured FTP client and a download manager that integrates into your web browser.
BitBeamer/1.0[FreshDownload/*]
FreshDownload
From their website: Fresh Download is an easy-to-use and very fast download manager software that turbo charges downloading files from the Internet, such as your favorite mp3 files, software, picture collections, video, etc.
FreshDownload/4.40[lftp/3.2.1]
lftp
Russian-based FTP program. It seems most folks on WMW don’t like it. I haven’t decided yet whether or not to ban it.
lftp/3.2.1DYNAMIC
[DYNAMIC]
DYNAMIC
Does not read robots.txt. I have no idea what this company does. Their website is essentially a blank page.E-Mail Harvesters
[*Larbin*]
Larbin
General purpose crawler. Can be configured for a variety of tasks including e-mail harvesting. The user agent can be customized but always includes Larbin somewhere in it. It does read robots.txt but I don’t know if it fully respects it as all it did was read my robots.txt file before leaving. Besides everything I have ever read about how this bot is used leads me to believe it should be banned.
larbin (protee@gmail.com)
larbin (samualt9@bigfoot.com)
larbin protee@gmail.com
larbin samualt9@bigfoot.com
larbin sebastien.ailleret@inria.fr
LARBIN-EXPERIMENTAL (efp@gmx.net)
larbin_2.1.1 larbin2.1.1@somewhere.com
larbin_2.2.0 (crawl@compete.com)
larbin_2.2.0 crawl@compete.com
larbin_2.6.2 (kalou@kalou.net)
larbin_2.6.2 (larbin2.6.2@unspecified.mail)
larbin_2.6.2 (ramiro@cs.cornell.edu)
larbin_2.6.2 (vitalbox1@hotmail.com)
larbin_2.6.2 larbin2.6.2@unspecified.mail
larbin_2.6.2 ramiro@cs.cornell.edu
larbin_2.6.3 (admins@uptime.at)
larbin_2.6.3 (alex.victoria@trilogy.com)
larbin_2.6.3 (aol@aol.com)
larbin_2.6.3 (crawler@ip2site.com)
larbin_2.6.3 (gqnmgsp@ruc.edu.cn)
larbin_2.6.3 (larbin-2.6.3@unspecified.mail)
larbin_2.6.3 (larbin2.6.3@ruc.edu.cn)
larbin_2.6.3 (larbin2.6.3@unspecified.mail)
larbin_2.6.3 (larbin2.6.3@verisignlabs.com)
larbin_2.6.3 (larbin2.6.3@versign.com)
larbin_2.6.3 (ltaa_web_crawler@groupes.epfl.ch)
larbin_2.6.3 (n.sugandh@epfl.ch)
larbin_2.6.3 (pimenas@softnet.tuc.gr)
larbin_2.6.3 (sneha@iitk.ac.in)
larbin_2.6.3 (wgao@cs.dal.ca)
larbin_2.6.3 (wgao@genieknows.com)
larbin_2.6.3 admins@uptime.at
larbin_2.6.3 aol@aol.com
larbin_2.6.3 crawler@ip2site.com
larbin_2.6.3 gqnmgsp@ruc.edu.cn
larbin_2.6.3 larbin-2.6.3@unspecified.mail
larbin_2.6.3 larbin2.6.3@ruc.edu.cn
larbin_2.6.3 larbin2.6.3@unspecified.mail
larbin_2.6.3 ltaa_web_crawler@groupes.epfl.ch
larbin_2.6.3 n.sugandh@epfl.ch
larbin_2.6.3 pimenas@softnet.tuc.gr
larbin_2.6.3 sneha@iitk.ac.in
larbin_2.6.3 wgao@cs.dal.ca
larbin_2.6.3 wgao@genieknows.com
larbin_2.6.3_for_(http://cosco.hiit.fi/search/) (Tomi.Silander@hiit.fi)
larbin_2.6.3_for_(http://cosco.hiit.fi/search/) (tsilande@hiit.fi)
larbin_2.6.3_for_(http://cosco.hiit.fi/search/) Tomi.Silander@hiit.fi
larbin_2.6.3_for_(http://cosco.hiit.fi/search/) tsilande@hiit.fi
larbin_extended (larbin@oktie.com)
larbin_extended larbin@oktie.com
larbin_test (nobody@airmail.etn)
larbin_test nobody@airmail.etn
Mozilla/4.0 (compatible; MSIE 6.0; AOL 8.0; SV1; .NET CLR 1.1.4322; Windows NT 5.1) (larbin@unspecified.mail)
Mozilla/4.0 (compatible; MSIE 6.0; AOL 8.0; SV1; .NET CLR 1.1.4322; Windows NT 5.1) larbin@unspecified.mail
Mozilla/4.0 (compatible; MSIE6.0; Windows NT 5.1; Maxthon;) (larbin2.6.3@unspecified.mail)
Mozilla/4.0 (compatible; MSIE6.0; Windows NT 5.1; Maxthon;) larbin2.6.3@unspecified.mail
Mozilla/5.0 (larbin@unspecified.mail)
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Larbin/2.6.3 (larbin@unspecified.mail)
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Larbin/2.6.3 larbin@unspecified.mail
Mozilla4.0 (larbin2.6.3@unspecified.mail)
Mozilla4.0 larbin2.6.3@unspecified.mail[*www4mail/*]
www4mail
From their website: www4mail is an open source application, that allows you to navigate off-line and search the whole Internet via electronic mail (e-mail) by using any standard Web browser and a MIME (Multipurpose Internet Mail Exchange) aware e-mail program. It did not request robots.txt.
Mozilla/4.0 www4mail/3.0 libwww-FM/2.14 (Unix; I)
Mozilla/4.5 www4mail/3.0 libwww-FM/2.14 (Unix; I)
www4mail/2.4 libwww-FM/2.14 (Unix; I)[EVE-minibrowser/*]
EVE-minibrowser
EVE Online is a MMOG. According to Project Honeypot EVE-minibrowser is being used extensively as an e-mail harvester. For that reason I’ve put it with this parent and banned it.
http://www.projecthoneypot.org/bsh_X19tb2RlPWdsb2JhbCZ1YWc9RVZFLW1pbmlicm93c2VyJTJGMy4w
http://en.wikipedia.org/wiki/EVE_Online[Franklin Locator*]
Franklin Locator
See the Links for a discussion on WebmasterWorld.[Mozilla/4.0 (compatible; Advanced Email Extractor*)]
Advanced Email Extractor
http://www.mailutilities.com/aee/
Mozilla/4.0 (compatible; Advanced Email Extractor v2.76)
Mozilla/4.0 (compatible; Advanced Email Extractor*)[*E-Mail Address Extractor*]
E-Mail Address Extractor
This is one of many products from Bejing Express.ELinks 0.10
[ELinks 0.10]
ELinks
From their website: ELinks is an advanced and well-established feature-rich text mode webbrowser. ELinks can render both frames and tables, is highly customizable and can be extended via Lua or Guile scripts.ELinks 0.11
[ELinks 0.11]
ELinks
From their website: ELinks is an advanced and well-established feature-rich text mode webbrowser. ELinks can render both frames and tables, is highly customizable and can be extended via Lua or Guile scripts.ELinks 0.12
[ELinks 0.12]
ELinks
From their website: ELinks is an advanced and well-established feature-rich text mode webbrowser. ELinks can render both frames and tables, is highly customizable and can be extended via Lua or Guile scripts.ELinks 0.9
[ELinks 0.9]
ELinks
From their website: ELinks is an advanced and well-established feature-rich text mode webbrowser. ELinks can render both frames and tables, is highly customizable and can be extended via Lua or Guile scripts.Emacs/W3
[Emacs/W3]
Emacs/W3
Emacs/W3 is a full-featured web browser, written entirely in Emacs-Lisp.Entireweb
[Entireweb]
Entireweb
Reads but does respect robots.txt.Envolk
[Envolk]
Envolk
Even after an upgrade, and stating on their bot page that they read and respect robots.txt, it’s just not true.Exalead
[Mozilla/5.0 (compatible; Exabot/3.0;*)]
Exabot
2007/1/18: They finally created a proper UA so this one isn’t banned like the others are.[Exalead NG/*]
Exalead NG
This is Exalead’s image preview bot.
Exalead NG/MimeLive Client (convert/http/0.123)
Exalead NG/MimeLive Client (convert/http/0.129)
Exalead NG/MimeLive Client (convert/http/0.141)
Exalead NG/MimeLive Client (convert/http/0.143)
Exalead NG/MimeLive Client (convert/http/0.146)[Exalead]
Exalead
French search engine. Does not read or respect robots.txt.
http://www.exalead.com/search[NG-Search/*]
NG-SearchBot
German search engine. Well behaved software, respected robots.txt.
NG-Search/0.90 (NG-SearchBot; http://www.ng-search.com; )[ng/*]
Exalead Previewer
This is Exalead’s image preview bot.
NG/1.0Feeds Blogs
[Net::Trackback/*]
Net::Trackback
From their website: This package is an object-oriented interface for developing Trackback clients and servers.Feeds Syndicators
[RSS-SPIDER (*)]
Feeds Syndicators
Looks at default root page for RSS tag(s).[Mozilla/5.0 (*aggregator:TailRank; http://tailrank.com/robot)*]
TailRank
From their website: Tailrank is a service that monitors blogs trying to find interesting memes and hot stories. We have a ‘robot’ which analyzes blogs periodically trying to find interesting stories. If we find that a story on your site is ‘hot’ we promoted it to our front page. This is a good thing and can drive a lot traffic to your website.
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1; aggregator:Tailrank; http://tailrank.com/robot) Gecko/20021130[SimplePie/*]
SimplePie
SimplePie is a very fast and easy-to-use class, written in PHP, for reading RSS and Atom syndication feeds.
SimplePie/1.0 Beta (Feed Parser; http://www.simplepie.org/; Allow like Gecko) Build/20060129[MagpieRSS/* (*)]
MagpieRSS
Older version of what has become SimplePie.[Feedreader * (Powered by Newsbrain)]
Newsbrain
I haven’t been able to find out any information about Newsbrain. Sure wish they’d include a URL!
Feedreader 3.01 (Powered by Newsbrain)[RssBandit/*]
RssBandit
Very abusive bot. I have them banned via my httpd.ini.RssBandit/1.3.0.42 (.NET CLR 1.1.4322.2032; WinNT 5.1.2600.0; http://www.rssbandit.org)
RssBandit/1.3.0.45 (.NET CLR 1.1.4322.2032; WinNT 5.1.2600.0; http://www.rssbandit.org)
RssBandit/1.3.0.45 (.NET CLR 1.1.4322.2032; WinNT 5.1.2600.0; http://www.rssbandit.org) (.NET CLR 1.1.4322.2032; WinNT 5.1.2600.0; http://www.rssbandit.org)[Akregator/*]
Akregator
From their website: Akregator is a news feed reader for the KDE desktop.
http://akregator.kde.org/[Feed43 Proxy/* (*)]
Feed For Free
Does not read robots.txt.[FeedBurner/*]
FeedBurner
Reads but does not respect robots.txt.
http://www.FeedBurner.com[Particls]
Particls
From their website: Particls helps you track your favourite sites, topics and apps by displaying desktop alerts for important changes.
http://www.particls.com[Mozilla/5.0 (RSS Reader Panel)]
RSS Reader Panel
RSS feed reader extension for Mozilla Firefox.
Mozilla/5.0 (RSS Reader Panel)[intraVnews/*]
intraVnews
From their website: Feed Reader and RSS Aggregator for Outlook.
intraVnews/1.1 (http://www.intravnews.com/)
intraVnews/1.12 (http://www.intravnews.com/)[Mobitype * (compatible; Mozilla/*; MSIE *.*; Windows *)]
Mobitype
Appears to be a France-based mobile RSS Feed Aggregator that focuses on blogs. The site is mostly in French.
http://www.mobitype.com/[Cocoal.icio.us/* (*)*]
Cocoal.icio.us
No robots.txt. This appears to be some sort of RSS/podcast search engine. I have no idea why it’s crawling my websites.
Cocoal.icio.us/1.0 (v38) (Mac OS X; http://www.scifihifi.com/cocoalicious)[Strategic Board Bot (?http://www.strategicboard.com)]
Strategic Board Bot
Did not read robots.txt. From their website: Strategic Board is a Web 2.0 search engine that aggregates IT related RSS feeds. We automatically monitor and identify new IT related blogs.
Strategic Board Bot ( http://www.strategicboard.com)[*NetVisualize*]
NetVisualize
From their website: NetVisualize Favorites Organizer lets you manage your bookmarks and favorites the way you remember them - visually! NetVisualize creates thumbnail images of your favorite websites, and is as simple to use and familiar as Windows Explorer.
Mozilla/4.0 (compatible; MSIE 5.0; NetVisualize b202)
Mozilla/4.0 (compatible; MSIE 5.0; NetVisualize b203)[Omnipelagos*]
Omnipelagos
From their website: Omnipelagos finds the shortest paths between any two things. I don’t know what that means.[JetBrains Omea Reader*]
Omea Reader
From their website: Omea Reader is an easy to use, all-in-one RSS/ATOM feed reader, newsgroup reader, and web bookmark manager. From my point of view: That may be true but it’s being used to check the wrong page for browscap.ini updates so I need to ban it.
JetBrains Omea Reader 1.0.2 (http://www.jetbrains.com/omea_reader/)
JetBrains Omea Reader 1.0.4 (http://www.jetbrains.com/omea_reader/)
JetBrains Omea Reader 2.0 (http://www.jetbrains.com/omea/reader/)
JetBrains Omea Reader 2.0 Release Candidate 1 (http://www.jetbrains.com/omea_reader/)
JetBrains Omea Reader 2.0 Release Candidate 6 (http://www.jetbrains.com/omea/reader/)
JetBrains Omea Reader 2.1.2 (http://www.jetbrains.com/omea/reader/)
JetBrains Omea Reader 2.1.6 (http://www.jetbrains.com/omea/reader/)Flatland Industries
[Flatland Industries]
Flatland Industries
Log spammer.FrontPage
[FrontPage]
FrontPage
From their website: iSiloX is the desktop application that converts content to the iSilo 3.x/4.x document format, enabling you to carry that content on your Palm OS PDA, Pocket PC PDA, Windows CE Handheld PC, or Windows computer for viewing using iSilo.General Crawlers
[DomainsDB.net MetaCrawler*]
DomainsDB
Reverse IP and NS lookup tool.
DomainsDB.net MetaCrawler v.0.9.7b (http://domainsdb.net/)
DomainsDB.net MetaCrawler v.0.9.7c (http://domainsdb.net/)[iVia Page Fetcher*]
iVia Software
Claims to respect robots.txt but it never even read it.
iVia Page Fetcher (http://ivia.ucr.edu/useragents.shtml)[Mozilla/5.0 (compatible; AboutUsBot/*)]
AboutUsBot
Did not read robots.txt. From their website: Gathers descriptive information about a website from several sources to build a Wiki Page.[WeBoX/*]
WeBoX
From their website: Web Collector & Text Collector & Web Database & Tab Browser & Tab Editor & RSS Reader.Basically a general crawler. It’s written in Japanese.
[BabalooSpider/1.*]
BabalooSpider
Comes from same IP Address as Exploder/0.1. As of 25 February 2007 the website is just a placemarker.
http://www.babaloo.si[KBeeBot/0.*]
KBeeBot
No robots.txt.[Nozilla/P.N (Just for IDS woring)]
Nozilla/P.N
Looking for World of Warcraft vulnerabilities.[ScollSpider/2.*]
ScollSpider
Despite the claims on their website this bot does not read robots.txt.
http://www.webwobot.com/ScollSpider.php[Lorkyll *.* -- lorkyll@444.net]
Lorkyll
I’ve had traffic from this bot’s netrange and banned individual addresses. But as of February 2007 the number of bad bots coming from the C class is enough for me to ban the C class via firewall.[West Wind Internet Protocols*]
Versatel
No robots.txt.
http://www.versanet.de[Marvin v0.3]
MedHunt
Marvin (Multi-Agent Retrieval Vagabond on Information Networks) is a medical information spider linked to MedHunt.
Marvin v0.3[*autokrawl*]
autokrawl
Read robots.txt way too late in the crawl. I also saw Voyager/1.0 crawling from the same IP Address.[Comodo HTTP(S) Crawler*]
Comodo HTTP Crawler
They appear to be SSL providers so why are they crawling my website? Both URLs in the user agent redirect to a home page. It did read and appear to obey robots.txt so I’ll try blocking it manually for now.
Comodo HTTP(S) Crawler - http://www.instantssl.com/crawler
Comodo HTTP(S) Crawler - http://www.instantssl.com/crawler, http://www.whichssl.com/crawler[Cynthia 1.0]
Cynthia
From their website: Cynthia is a web content accessibility validation solution, it is designed to identify errors in design related to Section 508 standards and the WCAG guidelines.
Cynthia 1.0[Diff-Engine*]
General Crawlers
It reads robots.txt and appears to obey it, although I have no idea what it is or what it does.
Diff-Engine (Liang.Lu@cern.ch)
Diff-Engine Liang.Lu@cern.ch[FRSEEKBOT]
FRSEEKBOT
This is a French search engine.[HTTP-Test-Program]
WebBug
WebBug lets you enter a URL, then displays exactly what it sends to the Web Server and, when the response is received, exactly what the Web Server sends back.
HTTP-Test-Program[http://www.almaden.ibm.com/cs/crawler*]
IBM’s WebFountain
The information collected from the web is currently being used in IBM’s Research Division for several search/indexing projects.
http://www.almaden.ibm.com/cs/crawler
http://www.almaden.ibm.com/cs/crawler [rc1.wf.ibm.com]
http://www.almaden.ibm.com/cs/crawler [rc2.wf.ibm.com]
http://www.almaden.ibm.com/cs/crawler [bc13]
http://www.almaden.ibm.com/cs/crawler [bc14]
http://www.almaden.ibm.com/cs/crawler [bc15]
http://www.almaden.ibm.com/cs/crawler [bc16]
http://www.almaden.ibm.com/cs/crawler [bc18]
http://www.almaden.ibm.com/cs/crawler [bc2]
http://www.almaden.ibm.com/cs/crawler [bc26]
http://www.almaden.ibm.com/cs/crawler [bc34]
http://www.almaden.ibm.com/cs/crawler [bc35]
http://www.almaden.ibm.com/cs/crawler [bc7]
http://www.almaden.ibm.com/cs/crawler [bc8]
http://www.almaden.ibm.com/cs/crawler [bc9]
http://www.almaden.ibm.com/cs/crawler [bv2m304]
http://www.almaden.ibm.com/cs/crawler [c01]
http://www.almaden.ibm.com/cs/crawler [c11]
http://www.almaden.ibm.com/cs/crawler [c12]
http://www.almaden.ibm.com/cs/crawler [fc15]
http://www.almaden.ibm.com/cs/crawler [fc2]
http://www.almaden.ibm.com/cs/crawler [fc3]
http://www.almaden.ibm.com/cs/crawler [fc4]
http://www.almaden.ibm.com/cs/crawler [fc7]
http://www.almaden.ibm.com/cs/crawler [fc8]
http://www.almaden.ibm.com/cs/crawler [fc9]
http://www.almaden.ibm.com/cs/crawler [hc2]
http://www.almaden.ibm.com/cs/crawler [hc3]
http://www.almaden.ibm.com/cs/crawler [hc5]
http://www.almaden.ibm.com/cs/crawler [wf223]
http://www.almaden.ibm.com/cs/crawler [wf84]
http://www.almaden.ibm.com/cs/crawler/
http://www.almaden.ibm.com/cs/crawler/ [v08l:odrayab:9.1.64.25][Mozilla/4.0 (compatible; MSIE 4.01; Vonna.com b o t)]
Vonna.com
Did not read robots.txt.[ccubee/*]
ccubee
From their website: [The product] integrates functionality of all significant areas of information processing that are focused on the content and the structure analysis of publicly available or internal resources.
ccubee/3.0
ccubee/3.1
ccubee/3.2
ccubee/3.5[XML Sitemaps Generator*]
XML Sitemaps Generator
Apparently this is the user agent you’ll see when creating an XML sitemap via Google, and perhaps others.[Mozilla/5.0 (compatible; Kyluka crawl; http://www.kyluka.com/crawl.html; crawl@kyluka.com)]
Kyluka
Questionable practices but for now they’re not banned, just requested to go away in robots.txt[QuickFinder Crawler]
QuickFinder
No robots.txt. The IP Address I saw belonged to Novell, Inc.
QuickFinder Crawler[Links4US-Crawler,*]
Links4US-Crawler
Claims to use data from DMOZ.org so why are they crawling themselves. Especially without reading robots.txt!
Links4US-Crawler, ( http://links4us.com/)[SynapticSearch/AI Crawler 1.?]
SynapticSearch
A search quickly turned up synapticsearch.com which redirects to alpha-leonis.lids.mit.edu/ss/. It’s a distributed crawler that is really not ready for prime-time. Most of the links on their website are broken. There is no contact information.
I’m not sure what to do with this one yet. For now I’m adding it to browscap as a general crawler and flagging it as isBanned.
If someone from this project at MIT can contact me I’d sure appreciate it. I don’t want to ban this bot, but unless it learns how to behave, like reading and respecting robots.txt, that’ll be my only option.[WebFilter Robot*]
WebFilter Robot
From their website: WebFilter is an intelligent agent that filters the new pages announced on the NCSA What’s New Page, looking for Web resources that match your ongoing interests.
WebFilter Robot 1.0[WebTrends/*]
WebTrends
Stuff related to WebTrends reports.
WebTrends/3.0 (WinNT)[Willow Internet Crawler by Twotrees V*]
Willow Internet Crawler
Willow Internet Crawler by Twotrees. Content Filtering.
Willow Internet Crawler by Twotrees V2.1[BravoBrian BStop*]
BravoBrian BStop
Dutch model car website directory
BravoBrian BStop
BravoBrian bstop.bravobrian.it[CJNetworkQuality; http://www.cj.com/networkquality]
CJNetworkQuality
It appears to read and respect robots.txt. But it’s still a controversial crawler so some folks may want to move it to Website Strippers. See the discussion on Webmaster World.
CJNetworkQuality; http://www.cj.com/networkquality[n4p_bot*]
n4p_bot
From their website: This is a peer-to-peer protocol for distributing files. It makes use of the upstream bandwidth of every downloader to increase the effectiveness of the distribution as a whole, and to gain advantage on the part of the downloader. The term they use to describe this is “torrents” as in BitTorrent software.
n4p_bot (crawler@n4p.com)
n4p_bot crawler@n4p.com[semanticdiscovery/*]
Semantic Discovery
From their website: The Semantic Discovery robot collects content from the web to be matched into focused “product and service” taxonomies and then published in multiple search engine directories.
semanticdiscovery/0.2(http://www.semanticdiscovery.com/sd/robot.html)
semanticdiscovery/0.3(http://www.semanticdiscovery.com/sd/robot.html
semanticdiscovery/0.4(http://www.semanticdiscovery.com/sd/robot.html[UbiCrawler/*]
UbiCrawler
Yet another university project of some kind.[UCmore]
UCmore
This is a toolbar for IE.
UCmore[niXXieBot?Foster*]
niXXiebot-Foster
Claims to be the first contextual advertising company in the UK. Their bot was very abusive. For starters it read robots.txt before every single one of the 36,000 pages it crawled. Also, while it was crawling with this user agent it was also crawling from the same IP Address using niXXieBot-Foster as the user agent.[SMBot/*]
SMBot
Appears to be a tool offered by Amazon.com.[searchbot admin@google.com]
searchbot
Trying to spoof Google and doing a bad job of it!
searchbot admin@google.com[PhpDig/*]
PhpDig
This is the default user agent for PhpDig. Usually a client will modify this user agent to reflect their own search engine. From the PhpDig website: PhpDig is a PHP and MySQL web spider and search engine, released under the GNU General Public License.
PhpDig/PHPDIG_VERSION ( http://www.phpdig.net/robot.php)[eventax/*]
eventax
Searches for online events, mostly in Germany.
eventax/1.3 (eventax; http://www.eventax.de/; info@eventax.de)[Tecomi Bot (http://www.tecomi.com/bot.htm)]
Tecomi
Bot page does not exist. Site is under development.
Tecomi Bot (http://www.tecomi.com/bot.htm)[dragonfly(ebingbong#playstarmusic.com)]
eBingBong
Did not request robots.txt.
http://www.ebingbong.com/[htdig/*]
ht://Dig
From their website: a complete indexing and searching system for a domain or intranet.
htdig/3.1.2 (webmaster@neurovia.umn.edu)
htdig/3.1.6 (romieu@bastide-medical.fr)
htdig/3.1.6 (unconfigured@htdig.searchengine.maintainer)
htdig/3.1.6 (webmaster@choiceoneonline.com)[ArachnetAgent*]
General Crawlers
This appears to be related to the TuringOS crawler.
ArachnetAgent 2.3[grub crawler]
grub crawler
From their website: Leveraging the power of distributed computing, Grub allows everyone with an Internet connection to participate in the last frontier of discovery. By downloading the unique screensaver, you can donate your computer’s unused bandwidth to probing the hidden depths of the Web.
grub crawler[Mozilla/4.0 (compatible; N-Stealth)]
N-Stealth
From their website: N-Stealth is a vulnerability-assessment product that scans web servers to identify security problems and weaknesses that may allow an attacker to gain privileged access.
Mozilla/4.0 (compatible; N-Stealth)[Lincoln State Web Browser]
Lincoln State Web Browser
Does not read robots.txt.
Lincoln State Web Browser[Seeker.lookseek.com]
LookSeek
Does not read robots.txt.
Seeker.lookseek.com[DTAAgent]
DTAAgent
User agent contained no details about what website it’s from or what it’s doing. It read robots.txt. An rDNS lookup returned an error. All I know is the IP Address is RIPE and appears to belong to a German ISP/Host.
DTAAgent[nicebot]
nicebot
This bot has a mixed reputation. In some cases it respects robots.txt. In other cases it doesn’t bother reading robots.txt.
nicebot[ShopWiki/1.0*]
ShopWiki
Crawler for ShopWiki website. It appears to read and respect robots.txt.
ShopWiki/1.0 ( http://www.shopwiki.com/)
ShopWiki/1.0 ( http://www.shopwiki.com/wiki/Help:Bot)[Mozilla/5.0 (compatible; Vermut*)]
Vermut
From their website: Vermut is a web crawler which collects web content for general analysis and building of search indexes. It appears to be part of AOL, but I can’t find absolute proof of that.[HTTP/1.0]
HTTP/1.0
Did not request robots.txt. IP Address resolves to opticaljungle.com. There is no website at that URL.
HTTP/1.0[OpenTaggerBot (http://www.opentagger.com/opentaggerbot.htm)]
OpenTaggerBot
Social bookmarking site.
OpenTaggerBot (http://www.opentagger.com/opentaggerbot.htm)[Tagyu Agent/1.0]
Tagyu
Converts text or a URL to tags.
Tagyu Agent/1.0[Visicom Toolbar]
Visicom Toolbar
An IE toolbar made with Visicom Media Dynamic Toolbar software.
Visicom Toolbar[RixBot (http://babelserver.org/rix)]
RixBot
Some sort of search engine for REBOL-related scripts and news.
RixBot (http://babelserver.org/rix)[Mozilla/4.1]
General Crawlers
No robots.txt.
Mozilla/4.1[Mozilla Compatible (MS IE 3.01 WinNT)]
General Crawlers
The user agent is just too old and odd to be a real browser. That, combined with the fact it ripped valuable content from one of my websites without even reading robots.txt makes me mad. That’s why it’s banned.
Mozilla Compatible (MS IE 3.01 WinNT)[SurveyBot/*]
SurveyBot
Domain availability checker. It’s dubious why they need to probe my sites each week when other whois services don’t need to. Plus I get no traffic from them at all. So they’re banned.
SurveyBot/2.2 Whois Source
SurveyBot/2.3 (Whois Source)[Search Fst]
Search Fst
Seems to follow behind human-powered user agents indexing pages the person has just visited. It never reads robots.txt. The company behind it is an engineering firm called Fay, Spofford & Thorndike, Inc.
Search Fst[sohu*]
sohu-search
Some sort of Chinese crawler. No robots.txt.
sohu agent
sohu-search[mozilla/5.0 (compatible; genevabot http://www.healthdash.com)]
Healthdash
From their website: Healthdash is the fastest and easiest way to find, understand and manage information about consumer health.
mozilla/5.0 (compatible; genevabot http://www.healthdash.com)[botlist]
botlist
This bot did not read robots.txt. The information on file for the IP Address appears to be spoofed.[shelob v1.*]
shelob
No robots.txt.[Gaisbot*]
Gaisbot
From their website: Gaisbot is the agent software of GAIS which crawls web sites all over the world, in order to build a search engine like google or altavista.[BruinBot*]
BruinBot
From their website: In the WebArchive project, we are interested in building a Web search engine prototype, where the users can ask for different versions of pages collected during different periods of time.
BruinBot ( http://webarchive.cs.ucla.edu/bruinbot.html)[CacheabilityEngine/*]
CacheabilityEngine
From their website: To help you understand how Web Caches will treat a Web page, the Cacheability Engine will look at a URL (and optionally any images or objects associated with it), giving both specific cache-related data about it, and a general commentary on how cacheable the object is.
CacheabilityEngine/1.30[InternetLinkAgent/*]
InternetLinkAgent
It appears to be a piece of free Japanese software that searches multiple search engines and sorts them for you.
InternetLinkAgent/3.1[Nudelsalat/*]
Nudelsalat
Noodle salad? It didn’t read robots.txt so it’s banned.
Nudelsalat/5.3 (Windoofs eNTe)[WhizBang]
WhizBang
Corporate Information Crawler[TheInformant*]
TheInformant
Similar to WebTrends.[Patwebbot (http://www.herz-power.de/technik.html)]
Patwebbot
Some type of crawler from Germany. As best I could tell from the site it’s just someone who wrote a bot to crawl the web with no real purpose in mind.
Patwebbot (http://www.herz-power.de/technik.html)[JetBrains*]
Omea Pro
From their website: Omea Pro is a powerful universal client for aggregating and organizing all kinds of information: emails, files, web links, RSS feeds, newsgroups, tasks, contacts, and even custom resource types that you define.
JetBrains Omea Pro 1.0.3 (http://www.jetbrains.com/omea/)
JetBrains Omea Pro 2.0 Release Candidate 5 (http://www.jetbrains.com/omea/)[nabot*]
Nabot
Run by Korea Telecom.[moget/*]
Goo
It is part of the ‘InfoBee’ project. It is very related to the regular Inktomi db but is branded as an alternative db. It grabs too many pages in a short period of time which is why it’s in this category.
moget/2.1 (moget@goo.ne.jp)[Ocelli/*]
Ocelli
From their website: Ocelli is a Web crawler owned and operated by GlobalSpec®, the leading specialized search engine and information resource for the engineering community. Ocelli’s mission is to find and index web pages for The Engineering Web from GlobalSpec, a unique slice of the World Wide Web focusing solely on engineering and technical content.Based on discussions I’ve seen on WebmasterWorld this spider is not very good at finding the niche content it claims to be searching for. I have banned it from my sites which have nothing to do with engineering, unless you want to count plastic model cars as being engineering related!
Ocelli/1.2 (http://www.globalspec.com/Ocelli)
Ocelli/1.3 (http://www.globalspec.com/Ocelli)[MapoftheInternet.com?(?http://MapoftheInternet.com)]
MapoftheInternet
Does not read robots.txt.
MapoftheInternet.com ( http://MapoftheInternet.com)[Webclipping.com]
Webclipping.com
From their website: WebClipping provides clients with news, information, and rumors from every key online source that impacts their business. With critical information collected and delivered to them, decision-makers can spot threats and opportunities in time to act effectively while saving hours of manual research.
Webclipping.com[Mozdex/0.7.2*]
Mozdex
URL in user agent is 404. From their website: mozDex is a search engine seeded from the dmoz.org directory. mozDex uses open source search technologies to create an open and fair index.
Mozdex/0.7.2 (Mozdex; http://www.mozdex.com/bot.html; spider@mozdex.com)
Mozdex/0.7.2-dev (Mozdex; http://www.mozdex.com/bot.html; spider@mozdex.com)[NetCarta_WebMapper/*]
NetCarta_WebMapper
Does not read robots.txt. Takes pages too quickly.[Clushbot/*]
Clushbot
It still does not request robots.txt.
Clushbot/3.1-BinaryFury ( http://www.clush.com/bot.html)
Clushbot/3.13-BinaryFury ( http://www.clush.com/bot.html)
Clushbot/3.16-BinaryFury ( http://www.clush.com/bot.html)
Clushbot/3.18-BinaryFury ( http://www.clush.com/bot.html)
Clushbot/3.2-BinaryFury ( http://www.clush.com/bot.html)
Clushbot/3.21-BinaryFury ( http://www.clush.com/bot.html)
Clushbot/3.23-BinaryFury ( http://www.clush.com/bot.html)
Clushbot/3.24-BinaryFury ( http://www.clush.com/bot.html)
Clushbot/3.3-BinaryFury ( http://www.clush.com/bot.html)
Clushbot/3.31-BinaryFury ( http://www.clush.com/bot.html)
Clushbot/3.33-BinaryFury ( http://www.clush.com/bot.html)
Clushbot/3.38-BinaryFury ( http://www.clush.com/bot.html)
Clushbot/3.41-BinaryFury ( http://www.clush.com/bot.html)
Clushbot/3.42-BinaryFury ( http://www.clush.com/bot.html)
Clushbot/3.47-BinaryFury ( http://www.clush.com/bot.html)
Clushbot/3.48-BinaryFury ( http://www.clush.com/bot.html)
Clushbot/3.49-BinaryFury ( http://www.clush.com/bot.html)
Clushbot/3.5-BinaryFury ( http://www.clush.com/bot.html)
Clushbot/3.50-BinaryFury ( http://www.clush.com/bot.html)
Clushbot/3.52-BinaryFury ( http://www.clush.com/bot.html)
Clushbot/3.53-BinaryFury ( http://www.clush.com/bot.html)
Clushbot/3.57-Ajax ( http://www.clush.com/bot.html)
Clushbot/3.58-Ajax ( http://www.clush.com/bot.html)
Clushbot/3.59-Hector ( http://www.clush.com/bot.html)
Clushbot/3.6-BinaryFury ( http://www.clush.com/bot.html)
Clushbot/3.60-Peleus ( http://www.clush.com/bot.html)
Clushbot/3.62-Laomedon ( http://www.clush.com/bot.html)
Clushbot/3.9-BinaryFury ( http://www.clush.com/bot.html)[VengaBot/*]
VengaBot
Did not read robots.txt. Appears to be a crawler for the Dutch CMS, Caret Web Content Management. The IP Address is registed to them.
VengaBot/1.00; Mozilla/5.0; Firefox/1.0.6 (X11; Linux i686; es)[SBIder/*]
SiteSell
From their website: SiteSell is gathering a statistical representation of topics presented on the Web as a whole. Each Web page visited is categorized under the topics that it represents, allowing our customers to know the percentage of Web pages that are about any particular topic.
SBIder/0.7 (SBIder; http://www.sitesell.com/sbider.html; http://support.sitesell.com/contact-support.html)
SBIder/0.8-dev (SBIder; http://www.sitesell.com/sbider.html; http://support.sitesell.com/contact-support.html)[*Networking4all*]
Networking4all Bot
German ISP that can’t seem to figure out redirects and how to make a bot retrieve my files so it’s banned for hogging too many CPU cycles.
SSLbot/1.0 (http://www.networking4all.com)
verzamelgids.nl - Networking4all Bot/1.5
verzamelgids.nl - Networking4all Bot/2.1[Miva (AlgoFeedback@miva.com)]
Miva
Did not read robots.txt. From their website: Today we offer a range of products and services through our three industry-facing divisions - MIVA Media, MIVA Small Business and MIVA Direct - aimed at significantly enhancing an advertiser’s ability to improve ROI, further minimizing waste and uncertainty.
Miva (AlgoFeedback@miva.com)[Mozilla/4.0 (compatible; MyFamilyBot/*)]
MyFamilyBot
This is apparently the parent company of Ancestry.com and other such sites. What are they crawling my sites for? And why are they taking disallowed files?
Mozilla/4.0 (compatible; MyFamilyBot/1.0; http://www.myfamilyinc.com)
Mozilla/4.0 (compatible; MyFamilyBot/1.0; http://www.ancestry.com/learn/bot.aspx)
Mozilla/4.0 (compatible; MyFamilyBot/1.0; http://www.myfamilyinc.com)[favorstarbot/*]
favorstarbot
Didn’t read robots.txt until well into its crawl.
http://favorstar.com/bot.html[metatagsdir/*]
metatagsdir
Does not read robots.txt.
http://metatagsdir.com/General RSS
[Mozilla/5.0 (compatible) GM RSS Panel]
RSS Panel
From their website: RSS Panel is designed as a generic Greasemonkey user script for any website. It’s purpose is to display a little floating panel at the left hand top of any web page, for which a RSS feed is available from the same domain.
http://www.xs4all.nl/~jlpoutre/BoT/Javascript/RSSpanel/[Mozilla/5.0 http://www.inclue.com; graeme@inclue.com]
Inclue
Inclue supposedly went out of business. I’m not sure what purpose this bot serves. It did not read robots.txt.
http://www.inclue.com/Google
[googlebot-urlconsole]
googlebot-urlconsole
This is Google’s service for requesting that they remove a URL from their index.[Mozilla/4.0 (compatible; GoogleToolbar*)]
Google Toolbar
Pre-fetches links and wastes a lot of bandwidth. My bandwidth, so of course Google doesn’t care. But I do. Banned.
Mozilla/4.0 (compatible; GoogleToolbar 1.1.70-deleon; Windows 2000 5.0)
Mozilla/4.0 (compatible; GoogleToolbar 2.0.111-big; Windows XP 5.1)
Mozilla/4.0 (compatible; GoogleToolbar 2.0.114.10-deleon; Windows 98 SE 4.10)
Mozilla/4.0 (compatible; GoogleToolbar 3.0.128.1-big; Windows XP 5.1)
Mozilla/4.0 (compatible; GoogleToolbar 3.0.131.0-big; Windows XP 5.1)
Mozilla/4.0 (compatible; GoogleToolbar 3.0.131.0-big; Windows XP 5.1; Google-TR-1)
Mozilla/4.0 (compatible; GoogleToolbar 3.0.131.0-big; Windows XP 5.1; Google-TR-3)
Mozilla/4.0 (compatible; GoogleToolbar 3.0.131.0-deleon; Windows Me 4.90)
Mozilla/4.0 (compatible; GoogleToolbar 4.0.1019.2378-big; Windows XP 5.1; MSIE 6.0.2900.2180)
Mozilla/4.0 (compatible; GoogleToolbar 4.0.1019.5266-big; Windows 2000 5.0; MSIE 6.0.2800.1106)
Mozilla/4.0 (compatible; GoogleToolbar 4.0.1019.5266-big; Windows 5.2; MSIE 6.0.3790.1830)
Mozilla/4.0 (compatible; GoogleToolbar 4.0.1019.5266-big; Windows XP 5.1; MSIE 6.0.2600.0000)
Mozilla/4.0 (compatible; GoogleToolbar 4.0.1019.5266-big; Windows XP 5.1; MSIE 6.0.2800.1106)
Mozilla/4.0 (compatible; GoogleToolbar 4.0.1019.5266-big; Windows XP 5.1; MSIE 6.0.2900.2180)
Mozilla/4.0 (compatible; GoogleToolbar 4.0.1019.5266-big; Windows XP 5.1; MSIE 7.0.5450.4)
Mozilla/4.0 (compatible; GoogleToolbar 4.0.1019.5266-big; Windows XP 5.1; MSIE 7.0.5700.6)
Mozilla/4.0 (compatible; GoogleToolbar 4.0.1019.5764-big; Windows XP 5.1; MSIE 6.0.2900.2180)
Mozilla/4.0 (compatible; GoogleToolbar 4.0.513.2948-big; Windows XP 5.1; MSIE 6.0.2900.2180)
Mozilla/4.0 (compatible; GoogleToolbar 4.0.629.4924-big; Windows XP 5.1; MSIE 6.0.2900.2180)
Mozilla/4.0 (compatible; GoogleToolbar 4.0.629.4924-big; Windows XP 5.1; MSIE 7.0.5450.4)
Mozilla/4.0 (compatible; GoogleToolbar 4.0.629.4924-big; Windows XP 5.1; MSIE 7.0.5700.6)
Mozilla/4.0 (compatible; GoogleToolbar 4.0.917.1454-big; Windows XP 5.1; MSIE 6.0.2900.2180)[Feedfetcher-Google;*]
Feedfetcher-Google
Feedfetcher is how Google grabs RSS or Atom feeds when users choose to add them to their Google homepage.
Feedfetcher-Google; ( http://www.google.com/feedfetcher.html)Hatena
[Hatena Bookmark/*]
Hatena Bookmark
Appears to be a Japanese link checker and bookmarks manager.
Hatena Bookmark/0.1
Hatena Bookmark/0.1 (http://b.hatena.ne.jp; 1 users)HTML Validators
[Weblide/2.0 beta8*]
Weblide
XHTML XML validator.
Weblide/2.0 beta8 (http://alexandre.alapetite.net/distribution/weblide/; Microsoft Windows NT 5.1.2600 Service Pack 2; .NET 2.0.50727.42; fr-FR; FRA)[W3C_Validator/*]
W3C Validator
W3C’s HTML Validation Service
W3C_Validator/1.183 libwww-perl/5.64
W3C_Validator/1.305 libwww-perl/5.64
W3C_Validator/1.305.2.109 libwww-perl/5.79
W3C_Validator/1.305.2.12 libwww-perl/5.64
W3C_Validator/1.305.2.137 libwww-perl/5.79
W3C_Validator/1.305.2.148 libwww-perl/5.800
W3C_Validator/1.305.2.148 libwww-perl/5.803
W3C_Validator/1.432.2.10[Jigsaw/* W3C_CSS_Validator_JFouffa/*]
Jigsaw CSS Validator
Similar to the W3C HTML validator except it’s for CSS.
Jigsaw/2.2.0 W3C_CSS_Validator_JFouffa/2.0
Jigsaw/2.2.3 W3C_CSS_Validator_JFouffa/2.0Hurricane Electric
[Mozilla/5.0 (Twiceler-*]
Twiceler
Part of Hurricane Electric.
http://www.cuill.com/twiceler/robot.html[Twiceler*]
Twiceler
Claims to be an experimental bot. It’s actually yet another crawler from the disgusting depths of Hurricane Electric.
http://www.cuill.com/twiceler/
Twiceler www.cuill.com/robots.html[Mozilla/4.04 (compatible; Dulance bot;*)]
Dulance
No robots.txt. From their website: Dulance is a completely automated price comparison engine covering virtually all online merchants in North America.
Mozilla/4.04 (compatible; Dulance bot; http://www.dulance.com/bot.jsp)iaskspider
[iaskspider]
iaskspider
This is not from iask.com.cn.Iceweasel
[Iceweasel]
Iceweasel
IceWeasel is the GNU version of the Firefox browser.
http://www.gnu.org/software/gnuzilla/IE 6.0
[Mozilla/4.0 (compatible; MSIE 6.0; *Windows NT 6.0;*.NET CLR 2*)*]
IEMozilla/4.0 (compatible; MSIE 6.0; Windows NT 6.0; .NET CLR 2.0.31113)
IE 7.0
[Mozilla/4.0 (compatible; MSIE 7.0; *Windows NT 6.0;*.NET CLR 2*)*]
IEMozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; .NET CLR 2.0.50727; SL Commerce Client v1.0; Media Center PC 5.0)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; MSDigitalLocker Vista 1.3; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.0.04306; Media Center PC 5.0)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; MSDigitalLocker Vista 1.3; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; MSDigitalLocker Vista 1.3; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04320)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; MSDigitalLocker Vista 1.3; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; WinFX RunTime 3.0.50727)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; MSDigitalLocker Vista 1.3; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; WinFX RunTime 3.0.50727; FDM)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; MSDigitalLocker Vista 1.3; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; WinFX RunTime 3.0.50727; InfoPath.1)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; MSDigitalLocker Vista 1.3; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; WinFX RunTime 3.0.50727; InfoPath.2)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.0.04320)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04320)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04320; InfoPath.2)IE 7.0b
[Mozilla/4.0 (compatible; MSIE 7.0b; *Windows NT 6.0;*.NET CLR 2*)*]
IEMozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0; .NET CLR 2.0.50215; SL Commerce Client v1.0; Tablet PC 2.0)
Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0; .NET CLR 2.0.50215; SL Commerce Client v1.0; Tablet PC 2.0; Avalon 6.0.4030)
Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0; .NET CLR 2.0.50215; SL Commerce Client v1.0; Tablet PC 2.0; Avalon 6.0.4030; WinFX RunTime 1.0.50215)
Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0; .NET CLR 2.0.50727; SL Commerce Client v1.0; Tablet PC 2.0; Media Center PC 3.1; Media Center PC $(runtime.Emerald_version))Ilse
[Ilse]
Ilse
Dutch search engine. Their bot appears to be well-behaved. They get good comments on WMW.Image Crawlers
[Mozilla/5.0 (Macintosh; U; *Mac OS X; *) AppleWebKit/* (*) Pandora/2.*]
Pandora
From their website: The image collector’s web spider and search agent for Mac OS X.
http://www.positivespinmedia.com/shareware/Pandora/[HTML2JPG Blackbox, http://www.html2jpg.com]
HTML2JPG
Takes screenshots of websites which is nice. The downside is the program can run in batch mode which makes it a potential image ripper.
HTML2JPG Blackbox, http://www.html2jpg.com[Camcrawler*]
Camcrawler
No robots.txt. From their website: The data collected from the crawler is used to find and index webcam pages and images all over the internet.[pixfinder/*]
pixfinder
Image stealer.[*PhotoStickies/*]
PhotoStickies
Used for grabbing webcam images, often against website TOS.[rssImagesBot/0.1 (*http://herbert.groot.jebbink.nl/?app=rssImages)]
rssImagesBot
Herbert Jebbink’s Website. Image bot does not read robots.txt.
rssImagesBot/0.1 ( http://herbert.groot.jebbink.nl/?app=rssImages)Inktomi
[Mozilla/5.0 (compatible; Yahoo! Slurp China; http://misc.yahoo.com.cn/help.html)]
Yahoo! Slurp China
I wish I could ban this bot, but it uses the same robots name as all the other Inktomi bots!
Mozilla/5.0 (compatible; Yahoo! Slurp China; http://misc.yahoo.com.cn/help.html)[YahooSeeker/*]
YahooSeeker
This is Yahoo’s user agent for indexing mobile content.
YahooSeeker/1.0 (compatible; Mozilla 4.0; MSIE 5.5; http://help.yahoo.com/help/us/shop/merchant/)
YahooSeeker/1.0 (compatible; Mozilla 4.0; MSIE 5.5; http://search.yahoo.com/yahooseeker.html)
YahooSeeker/1.1 (compatible; Mozilla 4.0; MSIE 5.5; http://help.yahoo.com/help/us/shop/merchant/)
YahooSeeker/1.2 (compatible; Mozilla 4.0; MSIE 5.5; yahooseeker at yahoo-inc dot com ; http://help.yahoo.com/help/us/shop/merchant/)[Mozilla/4.0 (compatible; Yahoo Japan; for robot study; kasugiya)]
Yahoo! RobotStudy
Did not read robots.txt which is typical of Y! bots these days.
Mozilla/4.0 (compatible; Yahoo Japan; for robot study; kasugiya)[Yahoo Pipes*]
Yahoo Pipes
Pipes is a hosted service that lets you remix feeds and create new data mashups in a visual programming environment.
http://pipes.yahoo.com/Internet Archive
[*heritrix*]
Heritrix
From the website: Heritrix is the Internet Archive’s web crawler which was specially designed for web archiving. Me again: It’s available to anyone who wants to download it and abuse it. That’s why I’ve banned it.
http://en.wikipedia.org/wiki/Heritrix
http://crawler.archive.org/
mozilla/5.0 (compatible; heritrix/1.0.4 http://non-exist.com)
mozilla/5.0 (compatible; heritrix/1.2.0 http://lab.mokk.bme.hu/members/bridge/)
mozilla/5.0 (compatible; heritrix/1.3.0 http://archive.crawler.org)
mozilla/5.0 (compatible; heritrix/1.3.0 http://crawler.archive.org)
Mozilla/5.0 (compatible; heritrix/1.3.0 http://www.l3s.de/)
Mozilla/5.0 (compatible; heritrix/1.4.0 http://www.chepi.net)
Mozilla/5.0 (compatible; heritrix/1.4.0 PROJECT_URL_HERE)
Mozilla/5.0 (compatible; heritrix/1.5 http://www.metacarta.com)
Mozilla/5.0 (compatible; heritrix/1.5.0 http://www.l3s.de/~kohlschuetter/projects/crawling/)
Mozilla/5.0 (compatible; heritrix/1.6.0 http://innovationblog.com)
Mozilla/5.0 (compatible; heritrix/1.8.0 http://wiki.office.aol.com/wiki/SEO)
os-heritrix/0.5.0 ( http://crawler.archive.org)[InternetArchive/*]
InternetArchive
Unsure exactly what this new user agent is doing. Some report it’s disrespectful of robots.txt. On my sites it’s been well behaved.
InternetArchive/0.8-dev (Nutch; http://lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)iSiloX
[iSiloX]
iSiloX
From their website: iSiloX is the desktop application that converts content to the iSilo 3.x/4.x document format, enabling you to carry that content on your Palm OS PDA, Pocket PC PDA, Windows CE Handheld PC, or Windows computer for viewing using iSilo.iVia Project
[iVia Project]
iVia Project
From their website: The iVia software is used in a range of projects, including the iVia Virtual Library Software which creates and manages Virtual Libraries, both automatically, and under the direct control of living, breathing, human librarians. iVia can be used to download pages from other Web sites.
http://ivia.ucr.edu/useragents.shtmlJakarta Project
[Jakarta Project]
Jakarta Project
From Wikipedia: The Jakarta Project creates and maintains open source software for the Java platform. It operates as an umbrella project under the auspices of the Apache Software Foundation, and all of Jakarta products are released under the Apache License.
http://jakarta.apache.org/Jayde Online
[exactseek-pagereaper-* (crawler@exactseek.com)]
exactseek-pagereaper
ExactSeek is a Meta-tag search engine. Your site will not be added if it does not have Title and Meta Description tags.
exactseek-pagereaper-2.63 (crawler@exactseek.com)[ExactSeek Crawler/*]
ExactSeek Crawler
ExactSeek is a Meta-tag search engine. Your site will not be added if it does not have Title and Meta Description tags.K-Meleon
[K-Meleon]
K-Meleon
From their website: iSiloX is the desktop application that converts content to the iSilo 3.x/4.x document format, enabling you to carry that content on your Palm OS PDA, Pocket PC PDA, Windows CE Handheld PC, or Windows computer for viewing using iSilo.Konqueror
[Konqueror]
Konqueror
I am only supporting valid Konqueror user agents. If you’ve modified your user agent so it no longer matches the standard please don’t complain to me about it.Link Checkers
[*Zeus*]
Zeus
From their website: Using link-building programs that query or use the search engines for finding web sites, data or information can penalize or even get your web site banned.
Zeus 14530 Webster Pro V2.9 Win32
Zeus 15180 Webster Pro V2.9 Win32
Zeus 15355 Webster Pro V2.9 Win32
Zeus 19850 Webster Pro V2.9 Win32
Zeus 2.6
Zeus 21628 Webster Pro V2.9 Win32
Zeus 27567 Webster Pro V2.9 Win32
Zeus 30979 Webster Pro V2.9 Win32
Zeus 31264 Webster Pro V2.9 Win32
Zeus 35520 Webster Pro V2.9 Win32
Zeus 40201 Webster Pro V2.9 Win32
Zeus 43271 Webster Pro V2.9 Win32
Zeus 47063 Webster Pro V2.9 Win32
Zeus 47844 Webster Pro V2.9 Win32
Zeus 49814 Webster Pro V2.9 Win32
Zeus 50267 Webster Pro V2.9 Win32
Zeus 54093 Webster Pro V2.9 Win32
Zeus 68378 Webster Pro V2.9 Win32
Zeus 73457 Webster Pro V2.9 Win32
Zeus 7393 Webster Pro V2.9 Win32
Zeus 75505 Webster Pro V2.9 Win32
Zeus 7913 Webster Pro V2.9 Win32
Zeus 86701 Webster Pro V2.9 Win32
Zeus 95389 Webster Pro V2.9 Win32
Zeus 96481 Webster Pro V2.9 Win32
Zeus ThemeSite Viewer Webster Pro V2.9 Win32
Zeusbot/0.07 (Ulysseek’s web-crawling robot; http://www.zeusbot.com; agent@zeusbot.com)[!Susie (http://www.sync2it.com/susie)]
!Susie
Social bookmarking: what a stupid phrase! This is a link checker. See also just plain Susie in this same section.
!Susie (http://www.sync2it.com/susie)[Bookdog/*]
Bookdog
From their website: Bookdog can sort, organize, eliminate duplicates, automatically verify, migrate and synchronize bookmarks between Safari, Camino, Firefox, OmniWeb and Opera.
http://www.sheepsystems.com/products/bookdog/[JRTwine Software Check Favorites Utility]
JRTwine
This bot is checking my downloads page which is a violation of my TOS so it’s banned.[FavOrg]
FavOrg
Favorites Manager PC Magazine utility
FavOrg[RPT-HTTPClient/*]
RPT-HTTPClient
Not sure what this is doing but it didn’t read robots.txt first. Usually you see this agent at the end of an agent string along with something like JCheckLinks.
RPT-HTTPClient/0.3-3
RPT-HTTPClient/0.3-3E[Link Valet Online*]
Link Valet
From their website: Link Valet is a WWW Link checker. When you enter the URL of an HTML page on the Web, it will fetch the page, and print a report on it. Link Valet will also spider your site.
Link Valet Online 1.1[CheckLinks/*]
CheckLinks
This does more than the name implies. It can strip entire websites.[Funnel Web Profiler*]
Funnel Web Profiler
A legitimate site mapping tool that is often abused.[Mozilla/4.0 (compatible; SuperCleaner*;*)]
SuperCleaner
Finds and removes websites from your Favorites list that are no longer working.
Mozilla/4.0 (compatible; SuperCleaner 2.57; Windows NT 5.1)
Mozilla/4.0 (compatible; SuperCleaner 2.67; Windows NT 5.1)
Mozilla/4.0 (compatible; SuperCleaner 2.75; Windows NT 5.1)
Mozilla/4.0 (compatible; SuperCleaner 2.84; Windows NT 5.1)
Mozilla/4.0 (compatible; SuperCleaner 2.90; Windows NT 5.1)
Mozilla/4.0 (compatible; SuperCleaner 2.93; Windows NT 5.1)[VSE/*]
VSE Link Tester
The section of the user agent in parenthesis contains custom text entered by each user of the product.
VSE/1.0 (testcrawler@hotmail.com)
VSE/1.0 (testcrawler@vivisimo.com)
VSE/1.0 (vivisimolog@web121.com)
VSE/1.0 (vsecrawler@hotmail.com)[Xenu* Link Sleuth*]
Xenu’s Link Sleuth
This is, or at least can be a very disrespectful and harmful link checker.
Xenu Link Sleuth 1.1f
Xenu Link Sleuth 1.2a
Xenu Link Sleuth 1.2b
Xenu Link Sleuth 1.2d
Xenu Link Sleuth 1.2e
Xenu Link Sleuth 1.2f
Xenu Link Sleuth 1.2g
Xenu Link Sleuth 1.2h
Xenu’s Link Sleuth 1.0p
Xenu’s Link Sleuth 1.1c[SiteBar/*]
SiteBar
This is a SourceForge bookmarks manager.
SiteBar/3.2.6
SiteBar/3.3.2 (Bookmark Server; http://sitebar.org/)
SiteBar/3.3.3 (Bookmark Server; http://sitebar.org/)
SiteBar/3.3.5 (Bookmark Server; http://sitebar.org/)[Z-Add Link Checker*]
Z-Add Link Checker
Web page that lets you check a URL.
Z-Add Link Checker (http://w3.z-add.co.uk/linkcheck/)[Mozilla/4.0 (compatible; smartBot/1.*; checking links; *)]
smartBot
The UA indicates it’s a link checker. On my sites all it did was a HEAD my sitemaps.
http://www.smartbot.com.au/[DocWeb Link Crawler (http://doc.php.net)]
DocWeb Link Crawler
PHP’s documentation link checker.
DocWeb Link Crawler (http://doc.php.net)[Mozilla/5.0 gURLChecker/*]
gURLChecker
From their website: gURLChecker is a graphical web links checker for GNU/Linux and other POSIX OS. It can work on a whole site, a single local page or a browser bookmarks file. From my perspective it’s being used to automate checks of my downloads page which is a violation of my TOS so it’s banned.
Mozilla/5.0 gURLChecker/0.8.0 ssl (Linux)[Mozilla/4.0 (Compatible); URLBase*]
URLBase
Bookmarks manager. Seems like a nice product but it’s being used to check my downloads page in violation of my TOS.
Mozilla/4.0 (Compatible); URLBase 6[Mozilla/4.0 (compatible; Link Utility; http://net-promoter.com)]
NetPromoter Link Utility
From their website: Link Utility is a powerful site management and link checker tool that helps webmasters automate the process of web site testing.
Mozilla/4.0 (compatible; Link Utility; http://net-promoter.com)[Susie (http://www.sync2it.com/bms/susie.php]
Susie
Social bookmarking website. From their website: Susie, Sync2It’s automated librarian, visits each of the websites bookmarked by our active user community right after it is uploaded to our server.
Susie (http://www.sync2it.com/bms/susie.php[onCHECK Linkchecker von www.scientec.de fuer www.onsinn.de]
onCHECK Linkchecker
Seems to be a link checker but the only information I can find is in German.
onCHECK Linkchecker von www.scientec.de fuer www.onsinn.de[Robozilla/*]
Robozilla
Visits sites listed in ODP to verify they’re still functional.
Robozilla/1.0[ActiveBookmark *]
ActiveBookmark
From their website: Main feature of Active Bookmark is ability make bookmark to concrete place of the page.
ActiveBookmark 1.0
ActiveBookmark 1.1Lycoris Desktop/LX
[Lycoris Desktop/LX]
Lycoris Desktop/LX
Lycoris bills itself as a Linux-based but easy to use desktop alternative to Windows including a Mozilla-based web browser.Media Players
[vobsub]
vobsub
vobsub is a plug-in for VirtualDub that allows you to rip subtitles from DVD VOB files and to use the provided DirectShow filter for DivX playback with subtitles.Microsoft
[Microsoft BITS/*]
BITS
BITS is a system service that applications can use to transfer files asynchronously between a client and an HTTP server.
Microsoft BITS/6.7Microsoft_Internet_Explorer
[Microsoft_Internet_Explorer]
Microsoft_Internet_Explorer
I have no idea what this is. It shows up with a variation on the basic user agent, tries to read zzrobots.txt, then scrapes my downloads page and leaves.Miscellaneous Browsers
[Mozilla/5.0 (Macintosh; ?; PPC Mac OS X;*) AppleWebKit/* (*) HistoryHound/*]
HistoryHound
Used for going back to websites in History and Bookmarks. Supposedly works with any Mac browser.
Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/417.9 (KHTML, like Gecko) HistoryHound/1.9[SCEJ PSP BROWSER 0102pspNavigator]
Wipeout Pure
Some sort of web browser for Sony’s PSP.
SCEJ PSP BROWSER 0102pspNavigator[ogeb browser , Version 1.1.0]
ogeb browser
I cannot find any info on this supposed browser. Maybe it’s a spoof. Either way it was not badly behaved.[Kopiczek/* (WyderOS*; *)]
Kopiczek
Polish browser using WyderOS. I can’t find out anything more about it than that.[Mozilla/4.0 (compatible; ibisBrowser)]
ibisBrowser
Japanese language web browser.[Mozilla/* (Win32;*Escape?*; ?)]
Escape
Espial Escape, a Java browser with scalable configuration capabilities, can be setup to match the memory requirements of a wide range of devices. Escape allows developers to selectively disable support for certain Internet standards so that browsers can be tailored to run in very resource-constrained designs or offer full functionality for other more powerful devices.
Mozilla/4.61 [en] (Win32; Escape 4.8; U)
Mozilla/4.61 [en] (Win32; Escape 5.03; I)
Mozilla/4.76 [en] (Win32;Escape 4.8; U)
Mozilla/4.76 [en] (Win32;Escape 5.03; U)[Sleipnir*]
Sleipnir
A Japanese browser that can also be scripted thus turning it into a website stripper. For now it seems well behaved on my sites so I won’t ban it. It is a wrapper and comes in versions for Trident and Gecko.
Sleipnir
Sleipnir Version 1.40
Sleipnir Version 1.41
Sleipnir Version 1.42
Sleipnir Version 1.66
Sleipnir/2.40
Sleipnir/2.41
Sleipnir/2.45
Sleipnir/2.46
Sleipnir/2.47[NetRecorder*]
NetRecorder
Home page is no longer operational.[GreenBrowser]
GreenBrowser
From their website: GreenBrowser is yet another IE based browser that offers tabbed, multi-page browsing and many additional features including grouped pages, ad filtering, search engine integration, privacy cleaner, form filler and much more.
GreenBrowserMozilla 1.9
[Mozilla 1.9]
Mozilla
From their website: Gran Paradiso Alpha 1 is an early developer milestone for the next generation of Mozilla’s layout engine, Gecko 1.9.
http://developer.mozilla.org/devnews/index.php/2006/12/08/gran-paradiso-alpha-1-now-available-for-download/NameProtect
[NPBot*]
NameProtect
NPBot (NameProtect Bot) engages in crawling activity in search of a wide range of brand and other intellectual property violations that may be of interest to their clients. It does seem to read and respect robots.txt but I don’t want it crawling my site.
NPBot
NPBot (http://www.nameprotect.com/botinfo.html)
NPBot-1/2.0
NPBot-1/2.0 (http://www.nameprotect.com/botinfo.html)
NPBot/3 (NPBot; http://www.nameprotect.com; npbot@nameprotect.com)[NP/*]
NameProtect
NP (NameProtect) engages in crawling activity in search of a wide range of brand and other intellectual property violations that may be of interest to their clients. It does seem to read and respect robots.txt but I don’t want it crawling my site.
NP/0.1 (NP; http://www.nameprotect.com; npbot@nameprotect.com)Naver
[Cowbot-* (NHN Corp*naver.com)]
Naver Cowbot
Seems to be associated with naver.com.
Cowbot-0.1.1 (NHN Corp. / 82-2-3011-1954 / nhnbot@naver.com)[Yeti/*]
Yeti
Part of naver.com.NewsGator
[NewsGator/*]
NewsGator
Did not request robots.txt and got caught in a bot trap.
http://www.newsgator.com[NetNewsWire*/*]
NetNewsWire
From their website: NetNewsWire is an easy-to-use RSS Web news reader for Mac OS X.
NetNewsWire/2.0.1 (Mac OS X; http://ranchero.com/netnewswire/)Nutch
[CazoodleBot/*]
CazoodleBot
This is Nutch in disguise!
http://www.cazoodle.com[Nutch]
Nutch
Does not read robots.txt. I have no idea what this company does. Their website is essentially a blank page.[LOOQ/0.1*]
LOOQ
Claims to be Nutch (in disguise).
LOOQ/0.1 alfa (LOOQ Crawler for european sites; http://looq.eu; root (at) looq dot eu)Offline Browsers
[*HTTrack*]
HTTrack
From their website: HTTrack is a free (GPL, libre/open source) and easy-to-use offline browser utility.
HTTrack Website Copier/3.0x (offline browser; web mirror utility)
Mozilla/4.5 (compatible; HTTrack 2.0x; Windows 98)
Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 2000)
Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)[*Check&Get*]
Check&Get
From their website: Check&Get is handy and powerful bookmark manager and web monitoring program that lets you organize your browser bookmarks, check your favorite Internet pages and detect if their content has changed or has become unavailable. My comments: While it did not read any excluded files neither did it consult robots.txt first to be certain of that. That’s why this ua is in the stripper category.
Mozilla/2.0 compatible; Check&Get 1.14 (Windows 98)
Mozilla/2.0 compatible; Check&Get 1.14 (Windows NT)
Mozilla/4.0 (compatible; Check&Get 3.0; Windows NT)[*TweakMASTER*]
TweakMASTER
Claims to be an Internet connection optimizer and what amounts to an offline browser.
Mozilla/3.0 (compatible; TweakMASTER 2.06784; Windows NT 5.1)
Mozilla/3.0 (compatible; TweakMASTER 2.06788; Windows ME)
Mozilla/3.0 (compatible; TweakMASTER)
TweakMASTER 2.xOnline Scanners
[Morfeus Fucking Scanner]
Morfeus Fucking Scanner
Morfeus Fucking Scanner looking for php vulnerabilties from this data center: Coreix Limited Admin (COREIX-DS2).[Mozilla/4.0 (compatible; Trend Micro tmdr 1.*]
Trend Micro
This is the HouseCall online virus scanner from Trend Micro.
Mozilla/4.0 (compatible; Trend Micro tmdr 1.0-1000)
Mozilla/4.0 (compatible; Trend Micro tmdr 1.0-1032)
Mozilla/4.0 (compatible; Trend Micro tmdr 1.0-1110)
Mozilla/4.0 (compatible; Trend Micro tmdr 1.0-1139)
Mozilla/4.0 (compatible; Trend Micro tmdr 1.2-1003)[virus_detector*]
Secure Computing Corporation
Sells anti-spam and security products. Not sure why they crawl my website.
virus_detector (virus_harvester@securecomputing.com)
virus_detector virus_harvester@securecomputing.com[Titanium 2005 (4.02.01)]
Panda Antivirus Titanium
This appears to be Panda Antivirus Titanium 2005. Based on the files it requested it appears to be a human browsing several of my websites. I do not understand why I’m seeing this user agent unless it’s spoofed.PeerFactory
[PeerFactory]
PeerFactory
JAVA class. Very badly behaved crawler. Took my index page hundreds of times before it was automatically banned.
http://www.nextapp.com/platform/echo2/echo/doc/api/public/app/nextapp/echo2/app/util/PeerFactory.htmlPocket PC
[*(compatible; MSIE *.*; Windows CE; PPC; *)]
Pocket PC
Siemens mobile devices running WinCE.Pogodak
[Mozilla/5.0 (compatible; TridentSpider/*)]
Pogodak!
Used to be Trident Search. Now Pogodak!.
Mozilla/5.0 (compatible; TridentSpider/3.1)[Pogodak]
Pogodak!
The entire D class where this crawler comes from is banned for abusive behavior.Proxy Servers
[CE-Preload]
CE-Preload
Cisco Content Engine
CE-Preload[Mozilla/5.0 (compatible; del.icio.us-thumbnails/*; *) KHTML/* (like Gecko)]
Yahoo!
I am sick of Yahoo’s open proxies. I banned the entire netrange for YAHOO-3. Just this week alone my site got ripped by a dozen user agents from this proxy.[ProxyTester*]
ProxyTester
Software that looks for proxy servers and uses them to surf more or less anonymously.
ProxyTester[SurfControl]
SurfControl
From their website: SurfControl helps companies stop unwanted content. Our highly sophisticated Content Filters understand Internet content, and put you back in control by filtering out the material you don’t want, so you can get to what you do want, when you want it.
SurfControlResearch Projects
[USyd-NLP-Spider*]
USyd-NLP-Spider
It claims to read and respect robots.txt. It did read it but it did not respect it. From their website: USyd-NLP-Spider gathers HTML pages for the purpose of research in Natural Language Processing at the School of Information Technologies, University of Sydney, Australia.
USyd-NLP-Spider (http://www.it.usyd.edu.au/~vinci/bot.html)[woriobot*]
woriobot
University of British Columbia Laboratory for Computational Sciences.
http://www.worio.com/[CMS crawler (?http://buytaert.net/crawler/)]
Research Projects
This is some university student who is heavily involved with Drupal. The URL in the UA is wrong. No robots.txt.[Taiga web spider]
Taiga
Yet another annoying and badly behaved bot from the brilliant students at Brown Univeristy. It doesn’t request robots.txt until several minutes into the crawl and then it doesn’t respect disallowed files and winds up in a bot trap. I have their netrange banned in my firewall.[wwwster/* (Beta, mailto:gue@cis.uni-muenchen.de)]
wwwster
Probably a research bot. Sent an e-mail on 1/15/2006.
wwwster/1.4 (Beta, mailto:gue@cis.uni-muenchen.de)[Forschungsportal/*]
Forschungsportal
Used by Federal Ministry of Education and Research[UofTDB_experiment* (leehyun@cs.toronto.edu)]
UofTDB Experiment
Yet another research project from a university. This time it’s the University of Toronto.
UofTDB_experiment (leehyun@cs.toronto.edu)[HooWWWer/*]
HooWWWer
Crawler for a research service called Next Generation Information Retrieval. The author says all the right things about ethical crawling on his site. So far this seems to be a well-behaved crawler. All it’s done so far though is read robots.txt. Still, being a research project I choose to ban it.
HooWWWer/2.1.0 ( http://cosco.hiit.fi/search/hoowwwer/ | mailto:crawler-infohiit.fi)
HooWWWer/2.1.3 (debugging run) ( http://cosco.hiit.fi/search/hoowwwer/ | mailto:crawler-infohiit.fi)
HooWWWer/2.2.0 (debugging run) ( http://cosco.hiit.fi/search/hoowwwer/ | mailto:crawler-infohiit.fi) [Amico Alpha * (*) Gecko/* AmicoAlpha/*]
Amico Alpha
According to their website this organization dissolved in 2005. Maybe it’s coming back to life. Their crawler does not read robots.txt and fell into a spider trap.
Amico Alpha 1.0 (Windows; U; Win98; de-DE; rv:1.1.1) Gecko/20051001 AmicoAlpha/1.0Rippers
[SiteParser/*]
SiteParser
From their website: The SiteParser is a site indexer, it will index your web pages through the internet or locally on your hard drive. I’m banning it because the author asked people to restrict it to domains owned by the user but clearly that’s not happening.
SiteParser/1[Mozilla/2.0 (compatible; NEWT ActiveX; Win32)]
NEWT ActiveX
This used to be a product from Delphi but the product has been abandoned.
Mozilla/2.0 (compatible; NEWT ActiveX; Win32)[Mozilla/4.0 (compatible; BorderManager*)]
Novell BorderManager
For some reason I only see this user agent stealing photos from my various websites.
Mozilla/4.0 (compatible; BorderManager 3.0)[AutoHotkey]
AutoHotkey
From their website: AutoHotkey is a free, open-source utility for Windows that will let you automate almost anything by sending keystrokes and mouse clicks. You can write a mouse or keyboard macro by hand or use the macro recorder.
http://www.autohotkey.com/[3wGet/*]
3wGet
From their website: 3wGet is the powerful download manager and websites downloader. It is designed for downloading files and web servers from Internet with the best possible speed which your connection can give you. It’s achieved due to splitting downloading file onto several sections, each of which is downloading simultaneously.
3wGet/151[Holmes/*]
Holmes
Holmes is an easy-to-use addition to MacOS 8.5’s Sherlock which provides the user with the ability to create search sets with similar Internet search sites (plug-ins) grouped together.
Holmes/1.0
holmes/2.3
holmes/2.4
holmes/3.9 (onet.pl)[sherlock/*]
Sherlock
Now, instead of tediously selecting Web search sites in Sherlock, simply select a set in No Shoot! Sherlock and launch Sherlock. Your set is now present in Sherlock without all the clutter of your remaining SRC files. Check the program site for more information.
sherlock/1.0[OCN-SOC/*]
OCN-SOC
Japanese page ripper
OCN-SOC/1.0[CFNetwork/*]
CFNetwork
I’m not positive about this because I can’t test it myself. Based on my research it’s my understanding this is the user agent that’s sent when you use Cocoa’s NSURL function to fetch a web page.
CFNetwork/0.9
CFNetwork/1.1
CFNetwork/10.4.3
CFNetwork/10.4.4
CFNetwork/129.10
CFNetwork/129.13
CFNetwork/129.16
CFNetwork/4.0[URL2File/*]
URL2File
From their website: URL2File is a free 32bit Windows console-mode application able to retrieve and save the content of a given World Wide Web or FTP URL to a local file.
URL2File/2.0 (Win98)[libcurl-agent/*]
libcurl
A multiprotocol file transfer library related to cURL.
libcurl-agent/1.0[WinScripter iNet Tools]
WinScripter iNet Tools
From their website: wsInetTools v0.3 beta: is a COM dll written in C++ that allows you to easily send email and download a web page and binary contents such as images, programs, etc.
WinScripter iNet Tools[HttpSession]
HttpSession
From their website: The servlet container uses this interface to create a session between an HTTP client and an HTTP server.
HttpSession[httpunit/*]
HttpUnit
Site tester being used as a site ripper. Does not read robots.txt
httpunit/1.5[Artera (Version *)]
Artera
Internet Accelerator[PigBlock (Windows NT 5.1; U)*]
PigBlock
PigBlock (Windows NT 5.1; U) [en]
PigBlock (Windows NT 5.1; U) [en] Gecko
PigBlock (Windows NT 5.1; U) [en]
PigBlock (Windows NT 5.1; U) [en] Gecko[BasicHTTP/*]
BasicHTTP
From their website: A full-featured HTTP socket for REALBasic.
BasicHTTP/1.0[W3CRobot/*]
W3CRobot
I don’t like automated agents using my downloads.asp page to check for browscap.ini updates!
W3CRobot/5.4.0 libwww/5.4.0[3D-FTP/*]
3D-FTP
From their website: 3D-FTP is FTP Client software helping you transfer files up to 20x faster over Internet.
3D-FTP/7.0[POE-Component-Client-HTTP/*]
POE-Component-Client-HTTP
From their website: a HTTP user-agent component
POE-Component-Client-HTTP/0.510 (perl; N; POE; en; rv:0.510)
POE-Component-Client-HTTP/0.65 (perl; N; POE; en; rv:0.650000)[Twisted PageGetter]
Twisted PageGetter
This is a Python Twisted-based spider. It did not read robots.txt.
Twisted PageGetter[DataCha0s/*]
DataCha0s
All reports indicate this crawler is dedicated to finding programs with known vulnerabilities. In particular it seems to like web stats programs and gallery applications. It was originally located at http://datacha0s.50megs.com but that site no longer exists.
DataCha0s/2.0[SBL-BOT*]
BlackWidow
BlackWidow is a site scanner, a site mapping tool, a site ripper, a site mirroring tool, an offline browser. Use it to scan a site and create a complete profile of the site’s structure, files, E-mail addresses, external links and even link errors. BlackWidow will also scan HTTP sites, SSL sites (HTTPS) and FTP sites.
SBL-BOT (http://sbl.net)[LeechFTP]
LeechFTP
Being used via a Thai DC. LeechFTP as a project died in 1999 so I cannot imagine why anyone is still using it.
http://en.wikipedia.org/wiki/LeechFTP
http://www.asiagenial.com/en/[CobWeb/*]
CobWeb
HTML editor that can also be used to rip websites.[hcat/*]
hcat
A program that uses the Perl socket library to do simple HTTP operations.[Open Web Analytics Bot*]
Open Web Analytics Bot
Originally a reporting system for WordPress. Now it can be used to crawl websites.[Snoopy*]
Snoopy
From their website: Snoopy is a PHP class that simulates a web browser. It automates the task of retrieving web page content and posting forms, for example.[Custo*]
Custo
From their website: Capable of reading HTML, CSS, JavaScript, and Shockwave Flash, Custo allows you to quickly retrieve information about the structure of a Web site.
Custo 1.7 (www.netwu.com)
Custo 1.8 (www.netwu.com)
Custo 1.9 (www.netwu.com)
Custo 2.0 (www.netwu.com)[curl/*]
cURL
From their website: Curl is a command line tool for transferring files with URL syntax, supporting FTP, FTPS, HTTP, HTTPS, GOPHER, TELNET, DICT, FILE and LDAP. Curl supports HTTPS certificates, HTTP POST, HTTP PUT, FTP uploading, kerberos, HTTP form based upload, proxies, cookies, user+password authentication, file transfer resume, http proxy tunneling and a busload of other useful tricks.
curl/7.10.2 (powerpc-apple-darwin7.0) libcurl/7.10.2 OpenSSL/0.9.7b zlib/1.1.4
curl/7.10.3 (i386-redhat-linux-gnu) libcurl/7.10.3 OpenSSL/0.9.6b zlib/1.1.3
curl/7.10.3 (i386-unknown-openbsd3.1) libcurl/7.10.3 OpenSSL/0.9.6b ipv6 zlib/1.1.4
curl/7.10.3 (i586-mandrake-linux-gnu) libcurl/7.10.3 OpenSSL/0.9.7a zlib/1.1.4
curl/7.10.3 (i686-pc-linux-gnu) libcurl/7.10.3 OpenSSL/0.9.7 zlib/1.1.4
curl/7.10.4 (i386-redhat-linux-gnu) libcurl/7.10.4 OpenSSL/0.9.6b ipv6 zlib/1.1.4
curl/7.10.4 (i386-redhat-linux-gnu) libcurl/7.10.4 OpenSSL/0.9.6b zlib/1.1.3
curl/7.10.4 (i386-redhat-linux-gnu) libcurl/7.10.4 OpenSSL/0.9.6b zlib/1.1.4
curl/7.10.4 (i686-pc-linux-gnu) libcurl/7.10.4 OpenSSL/0.9.6b zlib/1.1.3
curl/7.10.5 (i386-unknown-openbsd3.1) libcurl/7.10.5 OpenSSL/0.9.6b ipv6 zlib/1.1.4
curl/7.10.5 (i686-pc-linux-gnu) libcurl/7.10.5 OpenSSL/0.9.6b zlib/1.1.4
curl/7.10.5 (i686-suse-linux) libcurl/7.10.5 OpenSSL/0.9.7b ipv6 zlib/1.1.4
curl/7.10.6 (i386-redhat-linux-gnu) libcurl/7.10.6 OpenSSL/0.9.7a ipv6 zlib/1.1.4
curl/7.10.6 (i386-redhat-linux-gnu) libcurl/7.10.6 OpenSSL/0.9.7a ipv6 zlib/1.2.0.7
curl/7.10.6 (i386-redhat-linux-gnu) libcurl/7.10.6 OpenSSL/0.9.7a ipv6 zlib/1.2.1.2
curl/7.10.6 (i386-redhat-linux-gnu) libcurl/7.11.0 OpenSSL/0.9.7a zlib/1.1.4
curl/7.10.7 (i386-portbld-freebsd4.3) libcurl/7.10.7 OpenSSL/0.9.6g zlib/1.1.4
curl/7.10.7 (i386-portbld-freebsd4.8) libcurl/7.10.7 OpenSSL/0.9.7a ipv6 zlib/1.1.4
curl/7.10.7 (i586-mandrake-linux-gnu) libcurl/7.10.7 OpenSSL/0.9.7b zlib/1.1.4
curl/7.10.7 (i686-pc-linux-gnu) libcurl/7.10.7 OpenSSL/0.9.7c zlib/1.1.4
curl/7.10.7 (i686-redhat-linux-gnu) libcurl/7.10.7 OpenSSL/0.9.6b ipv6 zlib/1.1.4
curl/7.10.8 (i686-pc-linux-gnu) libcurl/7.10.8 OpenSSL/0.9.6c zlib/1.1.4
curl/7.10.8 (i686-pc-linux-gnu) libcurl/7.10.8 OpenSSL/0.9.7a ipv6 zlib/1.1.4
curl/7.11.0 (i386-portbld-freebsd4.10) libcurl/7.11.0 OpenSSL/0.9.7d zlib/1.1.4
curl/7.11.0 (i386-portbld-freebsd4.9) libcurl/7.11.0 OpenSSL/0.9.7c zlib/1.1.4
curl/7.11.0 (i586-mandrake-linux-gnu) libcurl/7.11.0 OpenSSL/0.9.7c zlib/1.2.1
curl/7.11.0 (i686-pc-linux-gnu) libcurl/7.11.0 OpenSSL/0.9.7a zlib/1.1.4
curl/7.11.0 (i686-pc-linux-gnu) libcurl/7.11.0 OpenSSL/0.9.7d ipv6 zlib/1.2.1
curl/7.11.0 (i686-suse-linux) libcurl/7.11.0 OpenSSL/0.9.7d ipv6 zlib/1.2.1
curl/7.11.1 (i386-redhat-linux-gnu) libcurl/7.11.1 OpenSSL/0.9.7a ipv6 zlib/1.1.4
curl/7.11.1 (i386-redhat-linux-gnu) libcurl/7.11.1 OpenSSL/0.9.7a ipv6 zlib/1.2.1.2
curl/7.11.1 (i686-pc-linux-gnu) libcurl/7.11.1 OpenSSL/0.9.7a zlib/1.1.4
curl/7.11.1 (powerpc-apple-darwin7.5.0) libcurl/7.11.1 OpenSSL/0.9.7d ipv6 zlib/1.1.4
curl/7.11.1 (powerpc-apple-darwin7.7.0) libcurl/7.11.1 OpenSSL/0.9.7d ipv6 zlib/1.1.4
curl/7.11.2 (i686-pc-linux-gnu) libcurl/7.10.2 OpenSSL/0.9.6i ipv6 zlib/1.1.4
curl/7.12.0 (i686-pc-linux-gnu) libcurl/7.12.0 OpenSSL/0.9.7a zlib/1.1.4
curl/7.12.0 (i686-pc-linux-gnu) libcurl/7.12.0 OpenSSL/0.9.7d ipv6 zlib/1.2.1
curl/7.12.0 (i686-pc-linux-gnu) libcurl/7.12.0 OpenSSL/0.9.7d ipv6 zlib/1.2.2
curl/7.12.0 (i686-pc-linux-gnu) libcurl/7.12.0 OpenSSL/0.9.7e ipv6 zlib/1.2.2
curl/7.12.1 (i386-redhat-linux-gnu) libcurl/7.12.1 OpenSSL/0.9.7a zlib/1.2.1.2 libidn/0.5.6
curl/7.12.1 (i686-pc-linux-gnu) libcurl/7.12.1 OpenSSL/0.9.7d zlib/1.2.1
curl/7.12.1 (i686-redhat-linux-gnu) libcurl/7.12.1 OpenSSL/0.9.7a zlib/1.2.1.2 libidn/0.5.6
curl/7.12.2 (i386-pc-linux-gnu) libcurl/7.12.2 OpenSSL/0.9.7e zlib/1.2.2 libidn/0.5.2
curl/7.12.2 (i386-pc-win32) libcurl/7.12.2 zlib/1.2.1
curl/7.12.3 (i386-redhat-linux-gnu) libcurl/7.12.3 OpenSSL/0.9.7a zlib/1.2.1.2 libidn/0.5.6
curl/7.13.0 (i386-pc-linux-gnu) libcurl/7.13.0 OpenSSL/0.9.7e zlib/1.2.2 libidn/0.5.2
curl/7.13.1 (i386-pc-linux-gnu) libcurl/7.13.1 OpenSSL/0.9.7e zlib/1.2.2 libidn/0.5.2
curl/7.13.1 (i386-portbld-freebsd4.10) libcurl/7.13.1 OpenSSL/0.9.7d zlib/1.1.4
curl/7.13.1 (i386-portbld-freebsd5.3) libcurl/7.13.1 OpenSSL/0.9.7g zlib/1.2.1
curl/7.13.1 (i386-portbld-freebsd5.4) libcurl/7.13.1 OpenSSL/0.9.7e zlib/1.2.1
curl/7.13.1 (i386-redhat-linux-gnu) libcurl/7.13.1 OpenSSL/0.9.7f zlib/1.2.2.2 libidn/0.5.15
curl/7.13.1 (i686-pc-linux-gnu) libcurl/7.13.1 OpenSSL/0.9.7e zlib/1.2.2
curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7b zlib/1.2.2
curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7g zlib/1.2.3
curl/7.13.1 (powerpc-apple-darwin8.0) libcurl/7.13.1 OpenSSL/0.9.7i zlib/1.2.3
curl/7.13.2 (i386-pc-linux-gnu) libcurl/7.13.2 OpenSSL/0.9.7e zlib/1.2.2 libidn/0.5.13
curl/7.13.2 (i686-pc-linux-gnu) libcurl/7.13.2 OpenSSL/0.9.7e zlib/1.2.2 libidn/0.5.15
curl/7.13.2 (i686-pc-linux-gnu) libcurl/7.13.2 OpenSSL/0.9.7e zlib/1.2.3 libidn/0.5.15
curl/7.14.0 (i386-pc-linux-gnu) libcurl/7.14.0 OpenSSL/0.9.7e zlib/1.2.2 libidn/0.5.13
curl/7.14.0 (i386-portbld-freebsd4.9) libcurl/7.14.0 OpenSSL/0.9.7b zlib/1.1.4
curl/7.14.0 (i386-portbld-freebsd5.4) libcurl/7.14.0 OpenSSL/0.9.7e zlib/1.2.1
curl/7.14.0 (i386-portbld-freebsd6.0) libcurl/7.14.0 OpenSSL/0.9.7e zlib/1.2.2
curl/7.14.0 (i486-pc-linux-gnu) libcurl/7.14.0 OpenSSL/0.9.7g zlib/1.2.3 libidn/0.5.13
curl/7.14.1 (i386-portbld-freebsd4.7) libcurl/7.14.1 OpenSSL/0.9.8 zlib/1.1.3
curl/7.15.0 (powerpc64-unknown-linux-gnu) libcurl/7.15.0 OpenSSL/0.9.7e zlib/1.2.3
curl/7.15.1 (i386-portbld-freebsd4.11) libcurl/7.15.1 OpenSSL/0.9.7d zlib/1.1.4
curl/7.15.1 (i586-pc-mingw32msvc) libcurl/7.15.1 zlib/1.2.2
curl/7.15.1 (i586-trustix-linux-gnu) libcurl/7.15.1 OpenSSL/0.9.7i zlib/1.2.3
curl/7.15.1 (x86_64-pc-linux-gnu) libcurl/7.15.1 OpenSSL/0.9.7j zlib/1.2.3
curl/7.15.1 (x86_64-pc-linux-gnu) libcurl/7.15.1 zlib/1.2.3
curl/7.15.3 (i386-portbld-freebsd6.0) libcurl/7.15.3 OpenSSL/0.9.7e zlib/1.2.2
curl/7.15.3 (i686-pc-linux-gnu) libcurl/7.15.3 OpenSSL/0.9.7a zlib/1.2.1.2 libidn/0.5.6
curl/7.15.4 (i686-pc-linux-gnu) libcurl/7.15.4 OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.3
curl/7.15.5 (i686-pc-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8b zlib/1.2.3 libidn/0.6.5
curl/7.15.5 (i686-redhat-linux-gnu) libcurl/7.15.5 OpenSSL/0.9.8a zlib/1.2.3 libidn/0.6.2
curl/7.7.2 (powerpc-apple-darwin6.0) libcurl 7.7.2 (OpenSSL 0.9.6b)
curl/7.7.2 (powerpc-apple-darwin6.0) libcurl 7.7.2 (OpenSSL 0.9.6e) (ipv6 enabled)
curl/7.8 (i386-redhat-linux-gnu) libcurl 7.8 (OpenSSL 0.9.6b) (ipv6 enabled)
curl/7.9 (i386-unknown-freebsd4.2) libcurl 7.9 (OpenSSL 0.9.6i) (ipv6 enabled)
curl/7.9.2 (i386-redhat-linux-gnu) libcurl/7.10.3 zlib/1.1.3
curl/7.9.2 (i386-redhat-linux-gnu) libcurl/7.10.3 zlib/1.1.4
curl/7.9.5 (i386-redhat-linux-gnu) libcurl 7.9.5 (OpenSSL 0.9.6b) (ipv6 enabled)
curl/7.9.5 (i586-pc-linux-gnu) libcurl 7.9.5 (ipv6 enabled)
curl/7.9.5 (i586-pc-linux-gnu) libcurl 7.9.5 (OpenSSL 0.9.6a)
curl/7.9.5 (i586-pc-linux-gnu) libcurl/7.10.3 OpenSSL/0.9.6c ipv6 zlib/1.2.1
curl/7.9.7 (i686-pc-linux-gnu) libcurl/7.11.1 OpenSSL/0.9.7a ipv6 zlib/1.1.4
curl/7.9.8 (i386–freebsd4.6) libcurl 7.9.8 (OpenSSL 0.9.6e)
curl/7.9.8 (i386-portbld-freebsd4.1) libcurl 7.9.8
curl/7.9.8 (i386-portbld-freebsd4.7) libcurl 7.9.8 (OpenSSL 0.9.6g) (ipv6 enabled)
curl/7.9.8 (i386-redhat-linux-gnu) libcurl 7.9.8 (OpenSSL 0.9.6b)
curl/7.9.8 (i386-redhat-linux-gnu) libcurl 7.9.8 (OpenSSL 0.9.7a) (ipv6 enabled)
curl/7.9.8 (i386-unknown-freebsd4.6.2) libcurl 7.9.8 (OpenSSL 0.9.6)
curl/7.9.8 (i686-pc-linux-gnu) libcurl 7.9.8 (OpenSSL 0.9.6b) (ipv6 enabled)[*WebGrabber*]
Rippers
WebGrabber is a utility that you can use to mirror, copy, synchronize, download, scrub or “steal” a web site.
www.substancia.com WebGrabber (ver 1.0)[*grub-client*]
grub-client
They claim to read/respect robots.txt. I have seen no personal evidence of that.Update: March 23, 2004 a Grub client 1.07 (didn’t think that was an official version, s/b 1.0.7 ???) read robots.txt and then got caught up in my trap so it got no further.
grub-client
Mozilla/4.0 (compatible; grub-client-0.2.3; Crawl your stuff with http://grub.org)
Mozilla/4.0 (compatible; grub-client-0.2.4; Crawl your own stuff with http://grub.org)
Mozilla/4.0 (compatible; grub-client-0.3.0; Crawl your own stuff with http://grub.org)
Mozilla/4.0 (compatible; grub-client-1.0.3; Crawl your own stuff with http://grub.org)
Mozilla/4.0 (compatible; grub-client-1.0.4; Crawl your own stuff with http://grub.org)
Mozilla/4.0 (compatible; grub-client-1.0.5; Crawl your own stuff with http://grub.org)
Mozilla/4.0 (compatible; grub-client-1.0.5; sponsored by www.cutecandy.com and grub.org)
Mozilla/4.0 (compatible; grub-client-1.0.6; Crawl your own stuff with http://grub.org)
Mozilla/4.0 (compatible; grub-client-1.0.7; Crawl your own stuff with http://grub.org)
Mozilla/4.0 (compatible; grub-client-1.07; Crawl your own stuff with http://grub.org)
Mozilla/4.0 (compatible; grub-client-1.1.1; Crawl your own stuff with http://grub.org)
Mozilla/4.0 (compatible; grub-client-1.2.1; Crawl your own stuff with http://grub.org)
Mozilla/4.0 (compatible; grub-client-1.3.1; Crawl your own stuff with http://grub.org)
Mozilla/4.0 (compatible; grub-client-1.3.7; Crawl your own stuff with http://grub.org)
Mozilla/4.0 (compatible; grub-client-1.4.3; Crawl your own stuff with http://grub.org)
Mozilla/4.0 (compatible; grub-client-1.5.3; Crawl your own stuff with http://grub.org)
Mozilla/4.0 (compatible; grub-client-2.3)
Mozilla/4.0 (compatible; grub-client-2.6.0)
Mozilla/4.0 (compatible; grub-client-2.6.1)[CAST]
CAST
Cast Software sells a data-mining application that can also mine websites.[b2w/*]
b2w
Almost knocked Webmaster World offline with 6K+ requests in under an hour.[JPluck/*]
JPluck
I am ashamed of SourceForge for allowing such a badly behaved piece of software to be made available through their website. It does not read robots.txt.
JPluck/2.0.9 (Java 1.4.2_03; Windows XP)
JPluck/2.1.1 (Java 1.4.2_04; Linux)
JPluck/2.1.6 (Java 1.4.2_04; Linux)
JPluck/2.1.6 (Java 1.4.2_04; Windows 2000)
JPluck/2.1.6b (Java 1.4.2; Linux)
JPluck/2.1.6b (Java 1.4.2_03; Windows XP)
JPluck/2.1.6b (Java 1.4.2_04; Windows 2000)[Kapere (http://www.kapere.com)]
Kapere
This is a download accelerator and “website grabber” (their terminology not mine) that does not read robots.txt.[ezic.com http agent *]
Ezic.com
IP resolves to NetBilling, Inc. but I have no idea what they’re doing. There is also a website at www.ezic.com but again I have no idea what they might be up to.[LeechGet*]
LeechGet
Download manager.
LeechGet (www.leechget.net)
LeechGet 2002 (www.leechget.de)
LeechGet 2003 (www.leechget.net)
LeechGet 2004 (www.leechget.net)
LeechGet 2005 (www.leechget.net)[Mozilla/3.0 (compatible; Indy Library)]
Rippers
This appears to be another automated agent checking my downloads.asp page instead of version.asp as it should be according to my TOS.Part of a Delphi/C++ builder suite of tools for doing internet stuff. The second Link is where I found out about this potentially nasty little bot.
Mozilla/3.0 (compatible; Indy Library)[MovableType/*]
MovableType Web Log
Why is someone’s blog reading the downloads page on my personal website? The answer doesn’t really matter as it didn’t read robots.txt first so it’s banned.
MovableType/2.51
MovableType/2.63
MovableType/2.6
-