Re: Proposal: Additional 'media' type

> Specifically, a "robot" type. This would allow web designers to create 
> pages that enable search engine indexers to focus on the content, 

This wouldn't work because it is so obviously open to abuse.  The general
policy if search engine indexers is that what the search engine sees should
be what a typical browser sees, otherwise the assumption is that keyword
stuffing is going on and the site should be blacklisted.

If there is significant difference between what a user should see and
what a search engine will see, there is something fundamentally wrong
with the site; maybe not misrepresentation, but maybe just using hte
wrong tool for the job, and therefore ending up with cosmetics encoded
in the HTML.

> enabling the users of search engines to find better results. One of the 

No.  Enabling site authors to misrepresent the site to search engines.
Even just selective quoting of the content can misrepresent the site.

The number of people seeking ways to abuse this mechanism and communicating
them to others will far exceed the number trying to make it work.

> designer switch immediately to an XSLT markup. Search engines should be 
> able to trust the content provided by the robots stylesheet, and could 

But any search engine indexer would fundamentally distrust such material.

> even implement spam-catching checks such as verifying that the 
> searchable blocks are visible (both display and colour) in the 
> screen/print pages. For such a small amount of code, it seems worth it.

It is much easier just to penalise a site for use of features than to
fully analyze it for safe use of those features.  Even if the indexer could
be made clever enough to discriminate, clever authors will resort to
marking words out of context.

> I am largely a user of CSS, and as such I very probably missed some 

The fundamental problem with respect to CSS is that this is not styling,
it is actually structural markup, so is outside the scope of CSS.  meta
description provides a way of giving information to search engines, but is
ignored, because of abuse.  That really means that the two remaining good
ways of doing so are by front loading the contents, always a good thing,
and by proper use of Hn (some search engines supposedly use headings to
produce the summary).

The real way of handling this is to make HTML better at distingushing
navigation from immediate content, and user agents that don't require
every page to also contain the menu structure (e.g. by making better use
of link elements).

Received on Monday, 14 July 2003 13:14:59 UTC