- From: Tim Bray <tbray@textuality.com>
- Date: Thu, 27 Feb 2003 10:49:17 -0800
- To: www-tag@w3.org
I took an action item last TAG telecon to raise a strawman proposal. TBL launched this with his proposal at http://lists.w3.org/Archives/Public/www-tag/2003Feb/0093.html. He outlines the problem and proposes a new HTTP header (the note says "HTTP tag" but that's a typo), but isn't quite explicit enough in acknowledging that we're inventing a new architectural thing, the notion of a "site". Here's how I'd come at it. Right now the web architecture doesn't have any formal notion of a "site", and software that tries to pretend it does by and large doesn't do a very good job (as the author of two large-scale web spiders I have bitter first-hand knowledge). Things like /robots.txt that try to pretend that a host is a site have problems because, well, a host isn't always a site. So let's introduce a formal notion of a "Web Site", which is a collection of Resources, each identified by URI. A resource can be in more than one site - not an obvious choice, but it seems it would be hard to enforce a rule to the contrary. Since a Web Site is an interesting and important thing, it ought to be a resource and ought to have a URI. There is no point trying to write any rules about whether all the resources on a site ought to be on the same host or whether the site's URI should look like those of the resources. Then you introduce a new HTTP header as TBL suggested. I'd call it "Web-site" or just "Site". Any server could, but need not, include this header in a response to a GET or HEAD request. You could easily include this in the <head> of HTML documents along the lines of <meta http-equiv="Web-site" content="http://example.com/site" /> Perhaps <link> would be better, or perhaps the HTML people might want to define new markup for the purpose. Of course, this leads inevitably to the question of what is a useful representation for a site. The kinds of stuff that could go there could include robots info, language info, favicon.ico equivalent, RSS info, p3p info, etc etc etc. Unlike the RDDL issues we've been discussing, I see little requirement for human readability, so this feels like a natural for a small (but extensible) RDF vocabulary, who cares if it's ugly. The RDF assertions would mostly have as their subject the URI "", which works well in this case. -Tim
Received on Thursday, 27 February 2003 13:49:22 UTC