<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>Kakkoi &#187; Blackhat</title>
	<atom:link href="http://42.kaizeku.com/topics/security/blackhat/feed/" rel="self" type="application/rss+xml" />
	<link>http://42.kaizeku.com</link>
	<description>web development, software, windows tips and trick</description>
	<pubDate>Sat, 12 Jul 2008 15:10:01 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6</generator>
	<language>en</language>
	<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<item>
		<title>How to track Google Proxy Hack Duplicate Contents</title>
		<link>http://42.kaizeku.com/tips/how-to-track-google-proxy-hack-duplicate-contents/</link>
		<comments>http://42.kaizeku.com/tips/how-to-track-google-proxy-hack-duplicate-contents/#comments</comments>
		<pubDate>Fri, 01 Feb 2008 06:29:10 +0000</pubDate>
		<dc:creator>Noah Ark</dc:creator>
		
		<category><![CDATA[Blackhat]]></category>

		<category><![CDATA[Google Alerts]]></category>

		<category><![CDATA[Tips]]></category>

		<category><![CDATA[CopyScape]]></category>

		<category><![CDATA[Google]]></category>

		<category><![CDATA[google alerts]]></category>

		<category><![CDATA[google-bug]]></category>

		<category><![CDATA[proxy]]></category>

		<category><![CDATA[proxy hack]]></category>

		<category><![CDATA[webscrapper]]></category>

		<guid isPermaLink="false">http://blog.kakkoi.net/tips/how-to-track-google-proxy-hack-duplicate-contents/</guid>
		<description><![CDATA[

I&#8217;m quite surprise to see my server logs todays, Some dude decide to scrap my blog content (including my wp translations cache 100mb+ ) 
The Offending uri:
http://www.shouker.com/user1/baiheinet/2008/1/16/80897.html
I&#8217;d blocked the site but it wont stop the search engine crawler from indexing the content .
This is nasty Blackhat SEO methods to get the target website penalize for [...]]]></description>
			<content:encoded><![CDATA[
<!-- google_ad_section_start -->
<p><img src='http://blog.kakkoi.net/wp-content/uploads/2007/12/marvin-apbot-costume-by-chaoskaizer.jpg' alt='Marvin Apbot costume by chaoskaizer' width="100" height="100" longdesc="http://gmodules.com/ig/proxy?url=http://blog.kakkoi.net/wp-content/uploads/2007/12/marvin-apbot-costume-by-chaoskaizer.jpg" />I&#8217;m quite surprise to see my server logs todays, Some dude decide to scrap my blog content (including my wp translations cache 100mb+ ) </p>
<pre>The Offending uri:
http://www.shouker.com/user1/baiheinet/2008/1/16/80897.html</pre>
<p>I&#8217;d blocked the site but it wont stop the search engine crawler from indexing the content .</p>
<p>This is nasty Blackhat SEO methods to get the target website penalize for duplicate content on Major Search Engine. There is few solution that i found at various resources &darr;.<br />
<span id="more-167"></span></p>
<ul>
<li>Report to Google, <dfn title="google proxy hack report">proxyreports@gmail.com</dfn> provide the url &#038; the google search query.</li>
<li>Block the Proxy Referrer IP</li>
<li>Add special no index meta for unknown search engine spiders.
<pre>&lt;META NAME=&quot;ROBOTS&quot; CONTENT=&quot;NOARCHIVE, NOINDEX, NOFOLLOW&quot;&gt;</pre>
</li>
</ul>
<h2>How to track Google Proxy Hacked Duplicate Contents</h2>
<ol>
<li>Monitor your content with <a class="exturl icn-r" href="http://www.google.com/alerts">Google Alerts</a> try used a unique <em>Search terms</em> for your website. i.e: blog.kakkoi, myname, myunique keywords, url http://blog.kakkoi.net, base64 safe uri encode.<br />
If you have a Google Webmaster Account go to <em>Statistics &raquo; What Googlebot sees</em> used the keywords as your Google Alerts search terms.
</li>
<li>Search for copies of your page on the Web <a href="http://www.copyscape.com/" class="exturl icn-r">copyscape</a></li>
</ol>
<h2>Whitelisting Search Engine Crawler</h2>
<p>IMO blocking the IP range of Proxy Server is not very practical. Having a Whitelist of Search Engine Crawler IP (class c) might do the trick. I&#8217;m working on a script for whitelisting search engine crawler for my wordpress. Hopefully i can finished it later this week. </p>
<h2>Google Algo bugs</h2>
<p><span class="vcard"><a href="http://www.seofaststart.com/" class="url fn microformat icn-l">Dan Thies</a></span> at seofaststart.com posts a details analysis regarding this issue, check out his post &rarr; <a class="exturl icn-r" href="http://www.seofaststart.com/blog/google-proxy-hacking">Google Proxy Hacking: How A Third Party Can Remove Your Site From Google SERPs</a>.</p>
<h2>Recent Update</h2>
<ul>
<li class="cf">Caught the proxy user just after I published this articles. Its human <em>117.8.222.77 / c-net 117.8.0.0/13</em> from Tianjin, China.<br />
<a href='http://blog.kakkoi.net/wp-content/uploads/2008/02/shouker-proxy.png' title='shouker-proxy.png' type="image/png"><img src='/wp-content/uploads/2008/02/shouker-proxy.thumbnail.png' alt='shouker.com proxy user' width='128' height='41' longdesc='http://gmodules.com/ig/proxy?url=http://blog.kakkoi.net/wp-content/uploads/2008/02/shouker-proxy.png' /></a></li>
<li>The IP was graylisted on RBL &#038; cml.anti-spam.org.cn so we send a letter to abuse@cnc-noc.net</li>
</ul>
<!-- google_ad_section_end -->
]]></content:encoded>
			<wfw:commentRss>http://42.kaizeku.com/tips/how-to-track-google-proxy-hack-duplicate-contents/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Warez site with High Pagerank. That&#8217;s not fair Google</title>
		<link>http://42.kaizeku.com/google/warez-site-with-high-pagerank-thats-not-fair-google/</link>
		<comments>http://42.kaizeku.com/google/warez-site-with-high-pagerank-thats-not-fair-google/#comments</comments>
		<pubDate>Mon, 19 Nov 2007 18:14:51 +0000</pubDate>
		<dc:creator>Noah Ark</dc:creator>
		
		<category><![CDATA[Blackhat]]></category>

		<category><![CDATA[Google]]></category>

		<category><![CDATA[linkexchange]]></category>

		<category><![CDATA[matt+cutts]]></category>

		<category><![CDATA[pagerank]]></category>

		<category><![CDATA[payperpost]]></category>

		<category><![CDATA[pr6]]></category>

		<category><![CDATA[trustrank]]></category>

		<guid isPermaLink="false">http://blog.kakkoi.net/google/warez-site-with-high-pagerank-thats-not-fair-google/</guid>
		<description><![CDATA[Just google for "warez full apps", most of results sites has PR6. I don't like to put any name here so do your own research and go ask Matt's for good answered. If you arent satisfied with this issue I suggest you open Google Webmaster accounts and submit those site for reviews/sandbox.

After the recent controversy over payperpost and linkexchange policy. What do you think of PR over this issue? Do you care about SEO?.]]></description>
			<content:encoded><![CDATA[
<!-- google_ad_section_start -->
<p>Just google for &#8220;warez full apps&#8221;, most of results sites has PR6. I don&#8217;t like to put any name here so do your own research and go ask <a href="http://www.mattcutts.com/blog/">Matt&#8217;s</a> for good answered. If you arent satisfied with this issue I suggest you open Google Webmaster accounts and submit those site for reviews/sandbox.</p>
<p>After the recent controversy over payperpost and linkexchange policy. What do you think of PR over this issue? Do you care about SEO?.</p>
<!-- google_ad_section_end -->
]]></content:encoded>
			<wfw:commentRss>http://42.kaizeku.com/google/warez-site-with-high-pagerank-thats-not-fair-google/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
