Proposed issue: site metadata hook from Tim Berners-Lee on 2003-02-10 (www-tag@w3.org from February 2003)

From: Tim Berners-Lee <timbl@w3.org>
Date: Mon, 10 Feb 2003 08:01:43 -0800
To: www-tag@w3.org
Cc: tag@w3.org
Message-Id: <F2A8BD6C-3D10-11D7-84BF-000393914268@w3.org>

In the face-face meeting I took an action to write up a proposal for
the following potential issue:


Proposed Short name:  SiteMetadata-nn

Title:   Web site metadata improving on robots.txt, w3c/p3p and favicon 
etc

The architecture of the web is that the space of identifiers
on an http web site is owned by the owner of the domain name.
The owner, "publisher",  is free to allocate identifiers
and define how they are served.

Any variation from this breaks the web.  The problem
is that there are some conventions for the identifies on websites,
that

    /robots.txt  is a file controlling robot access
    /w3c/p3p is where you put a privacy policy
    /favico   is an icon representative of the web site

and who knows what others.  There is of course no
list available of the assumptions different groups and manufacturers
have used.

These break the rule.  If you put a file which happens to be
called robots.txt  but has something else in, then weird things happen.
One might think that this is unlikely, now, but the situation could
get a lot worse.  It is disturbing that a
precedent has been set and the number of these may increase.

There are other problems as well - as well sites are catalogued
by a number of different agents, there tend to be all kinds
or request for things like the above, while one would like to
be able to pick such things up as quickly as possible.

If, when these features were designed, there had been a
general way of attaching metadata to a web site, it would
not have been necessary.

The TAG should address this issue and find a solution,
or put in place steps for a solution to be found,
which allows the metadata about a site, including that for
later applications, to be found with the minimum overhead
and no use of reserved URIs within the server space.

Example solution for feasability

A new http tag such as "Metadata:" is introduced into HTTP
This takes one parameter, which is the URI of the
metadata document.  The header is supplied on response to
any GET or HEAD of the root document  ("/"). It may also
be supplied on a any other request, including error
requests.

The Metadata document is conventionally written in RDF/XML.
It contains pointers to all kinds of standard and/or proprietary
metadata about the site, including for example

- privacy policy
- robot control
- icon for representing the site
- site maps
- syndicates (RSS ) feeds
- IPR information
- site policy
- site owners

The solution only needs to document the hook and the
vocabulary to point to metadata resources in current
use.  Vocabulary for new applications can be defined
by those applications.

timbl

Received on Monday, 10 February 2003 17:25:12 UTC