New Attribute Suggestion from Michael A. Peters on 2009-06-16 (public-html-comments@w3.org from June 2009)

From: Michael A. Peters <mpeters@mac.com>
Date: Mon, 15 Jun 2009 23:02:48 -0700
To: public-html-comments@w3.org
Message-id: <4A373588.40200@mac.com>

Hi - I hope this is the right list.

I have a suggestion for a new attribute to potentially make it into 
(x)html standard.

The attribute is for search engines, to instruct them not to index part 
of a page.

What I'm currently doing in xhtml is this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" 
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd" [
<!ATTLIST div spider (on | off) #IMPLIED>
]>

The added attribute is spider and takes a value of on or off.

I'm using it in a modification of the open source sphider search engine 
I'm working on. The idea is to avoid using html comments to turn on/off 
indexing on part of a page.

The actual attribute name and values of such an attribute is definitely 
open to discussion, but I think it should be non search crawler specific.

Example of use -

<p>This paragraph is indexed</p>
<p spider="off">This paragraph is not indexed</p>
<p>This paragraph is indexed</p>
<div spider="off">
   <p>This paragraph is not indexed</p>
   <p spider="on">This paragraph is indexed</p>
</div>
<img src="foo.jpg" alt="[This image is indexed]" />
<img src="bar.gif" spider="off" alt="[This image is not indexed]" />

Default is on unless the node or a parent node has turned it off.

It would be useful for things like navigation areas, images/multimedia 
you specifically do not want engines to index, signature areas of 
bulletin boards, etc.

Of course search engines would need their indexers to respect it, but 
that's why a standard attribute is very desirable. With a standard, many 
search engines would implement it as when properly used by the 
webmaster, it would improve the usefulness of the search engine.

Thoughts?

Received on Tuesday, 16 June 2009 06:03:29 UTC