RE: OK enough, lets fix blogspam from Hallam-Baker, Phillip on 2007-01-03 (public-wsc-wg@w3.org from January 2007)

From: Hallam-Baker, Phillip <pbaker@verisign.com>
Date: Wed, 3 Jan 2007 14:05:17 -0800
To: <michael.mccormick@wellsfargo.com>, <public-wsc-wg@w3.org>
Message-ID: <198A730C2044DE4A96749D13E167AD3701059ECA@MOU1WNEXMB04.vcorp.ad.vrsn.com>

I think it needs to be more granular.

I want Google to index my post but not the comments.

________________________________

From: public-wsc-wg-request@w3.org [mailto:public-wsc-wg-request@w3.org] On Behalf Of michael.mccormick@wellsfargo.com
Sent: Wednesday, January 03, 2007 5:03 PM
To: Hallam-Baker, Phillip; public-wsc-wg@w3.org
Subject: RE: OK enough, lets fix blogspam

Could this be done with <META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW"> tag? Maybe it just needs to be more granular to apply to specific portions of a HTML body instead of a whole page.

Michael McCormick, CISSP
Lead Architect, Information Security

This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation.

________________________________

From: public-wsc-wg-request@w3.org [mailto:public-wsc-wg-request@w3.org] On Behalf Of Hallam-Baker, Phillip
Sent: Wednesday, January 03, 2007 3:02 PM
To: W3 Work Group
Subject: OK enough, lets fix blogspam

My blogs overunneth with spam. Yea my cup is full to overflowing.

None of the spam is targetted at either me or my readers. It is all targetted at Google's web crawler and their pagerank algorithm.

At the F2F meeting someone opposite me raised a similar solution to the following but in the context of scripting its a simple fix that I think would work.

The idea is to have a HTML attribute or element that allows a server to declare that a section of a Web page came from an external source. The idea would be to encapsulate all blog comments and the like so that browsers can look at the content and conclude 'don't run any code from this region' and Web crawlers can ignore the content for the purposes of PageRank and the like.

In order to get maximal security the best approach would be to use some form of nonce sentinel value at the start and finish of the block as was proposed at one of the TIPPI workshops.

In order to engage the type of accountability controls that I want to establish it should also be possible to specify the authenticated poster identity if known.

So for example we might have:

<p>My Web 3.14159265 meme seems to be catching on.
<Inc:Start rel="foreign" authID="mailto:alice@example.com" authmech="saml1.2" sentinel="aegq3tgr2q3uyt1387==" />
Nice post but have you considered this? <a href="http://www.spamisus.com/spork>Spork dietary supplement really works!</a>
<Inc:End sentinel="aegq3tgr2q3uyt1387==">

It needs some work to fit it into XHTML properly. Close tags don't take attributes in XML which is a challenge.

To be effective the sentinel values have to be synthesized on the fly with the rest of the content but that should not be a huge issue.

Where is the best place to work on this? Do we have any Google people here?

Received on Wednesday, 3 January 2007 22:05:38 UTC