OK enough, lets fix blogspam

My blogs overunneth with spam. Yea my cup is full to overflowing.
 
 
None of the spam is targetted at either me or my readers. It is all targetted at Google's web crawler and their pagerank algorithm.
 
At the F2F meeting someone opposite me raised a similar solution to the following but in the context of scripting its a simple fix that I think would work.
 
 
The idea is to have a HTML attribute or element that allows a server to declare that a section of a Web page came from an external source. The idea would be to encapsulate all blog comments and the like so that browsers can look at the content and conclude 'don't run any code from this region' and Web crawlers can ignore the content for the purposes of PageRank and the like.
 
In order to get maximal security the best approach would be to use some form of nonce sentinel value at the start and finish of the block as was proposed at one of the TIPPI workshops.
 
In order to engage the type of accountability controls that I want to establish it should also be possible to specify the authenticated poster identity if known.
 
 
So for example we might have:
 
<p>My Web 3.14159265 meme seems to be catching on. 
<Inc:Start rel="foreign" authID="mailto:alice@example.com" authmech="saml1.2" sentinel="aegq3tgr2q3uyt1387==" />
Nice post but have you considered this? <a href="http://www.spamisus.com/spork>Spork dietary supplement really works!</a>
<Inc:End sentinel="aegq3tgr2q3uyt1387==">
 
 
It needs some work to fit it into XHTML properly. Close tags don't take attributes in XML which is a challenge.
 
To be effective the sentinel values have to be synthesized on the fly with the rest of the content but that should not be a huge issue.
 
 
Where is the best place to work on this? Do we have any Google people here?

Received on Wednesday, 3 January 2007 21:02:46 UTC