W3C home > Mailing lists > Public > www-tag@w3.org > February 2003

Re: Proposed issue: site metadata hook

From: Paul Prescod <paul@prescod.net>
Date: Thu, 13 Feb 2003 19:36:41 -0800
Message-ID: <3E4C6449.7060305@prescod.net>
To: Seairth Jacobs <seairth@seairth.com>, www-tag@w3.org

Seairth Jacobs wrote:
>...
> 
> Any such hook might need to keep a few things in mind (imho):
> 
> 1) In the case of /robots.txt, /w3c/p3p, and /favico, these can be easily
> maintained by even the least experienced person just by copying the
> appropriate file to the appropriate location.  That's it.  No other files,
> headers, server settings, etc. need to be touched.  Requiring people to do
> any more than this seems like an uphill battle.

True, but the end-user's workflow is a reflection of their available 
tools. Just as servers know that "index.html" is magical, they 
could/should know that "robots.txt" is magical. This makes the server 
vendors the violators of the user's namespace which is okay: the user 
should be able to  configure the server to use the namespace differently 
(as you can turn off the magic handling of "index.html").

> 2) In the case of robots.txt, any hook that provides an added level of
> indirection will likely not be adopted.  For instance, if GoogleBot has to
> issue a HEAD /, then follow a URI (returned in the header) to get back an
> RDF document, then parse the document to find the location of the robots.txt
> file, then turn around and do this for every other site on the web it
> indexes, I'm guessing Google would continue on with the /robots.txt file.

How many "sites" do you think Google indexes versus pages? Also, Google 
doesn't have to do a HEAD. It more likely does a GET because it is 95% 
likely to need the root homepage anyhow. If it finds a metadata URL and 
that metadata URL happens to say "don't index me" then Google throws 
away the page it got.

Also, consider how many extra GETs Google must do today for non-existing 
robot.txt's. Surely there is a cost to that. If more and more metadata 
URIs are added, the system will fail to scale.

>...
> 3) How much trouble is this causing right now?  In theory, it makes sense
> that the owner of a domain should have full control over his identifiers and
> the resource(s) they point to.  In practice, though, how many people have
> had issues with this, especially compared to the number that haven't had an
> issue?

Personally, I would say that there is a fairly major issue that 
robots.txt can only live at the root.

  Paul Prescod
Received on Thursday, 13 February 2003 22:37:40 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:47:16 GMT