- From: <Patrick.Stickler@nokia.com>
- Date: Thu, 27 Feb 2003 09:55:28 +0200
- To: <ij@w3.org>, <www-tag@w3.org>
> 2.1 Site metadata hook > > ... > > [Chris] > there is no way to give a URI of a site as opposed > to a URI for > a welcome page for it > hmm... sites are significant resources, no? so they > should have > URIs..... > > [Roy] > / I would propose that http://example.com denotes the HTTP server thus <http://example.com> a x:WebServer . and that a separate URI scheme is needed to denote actual physical machine, since the http: URI scheme is rooted in an HTTP server (web authority) and the underlying "reality" of what actual machine that HTTP server is running on is below the "atomic" level of the http: URI scheme. (see below) As we wish to make a distinction between the HTTP server and the body of knowledge served by that server, i.e. between server and site, then I agree that http://example.com/ denotes the web site, managed by the HTTP server http://example.com thus <http://example.com/> a x:WebSite . Having those two distinct URIs allows then one to speak of the HTTP server specifically, such as its configuration, and the web site specifically, such as access rights, conditions of use of content, robot/crawling prefs, etc. And some subspace within that site can also be asserted as a web site, such as http://example.com/~fred/ denotes Fred's web site i.e. <http://example.com/~fred/> a x:WebSite . When one does a GET on either http://example.com or http://example.com/ we are simply redirected to a default home web page, which may be denoted by any of http://example.com/index.html http://example.com/index.htm http://example.com/index.jsp http://exmaple.com/foo.blargh or whatever the HTTP server has been configured to use as the default page. When one does an MGET ;-) on http://example.com one gets a description of the HTTP server. When one does an MGET on http://example.com/ one gets a description of the web site, including robot preferences, RSS feeds, whatever. When one does an MGET on http://example.com/~fred/ one gets a description of fred's web site, including Fred's robot preferences, etc. When one does an MGET on http://example.com/index.html one gets a description of a web page. Etc... > [TBray] > No, "/" isn't the site it's the server, they're not the same > things Is that formally defined in some spec somewhere? Why can't we say that a URI having "http://"{AUTH}"/" denotes the root site of a given server "http://"{AUTH}. Seems pretty intuitive and consistent. As I understand it, the web server behavior is to interpret both http://example.com and http://example.com/ as resolving to the same entity. But that resolution process could be seen as a redirection to a default home page, and that entity as a representation of that home page. Yet each of those URIs can still denote the server and site respectively. The redirection gets around the need for those URIs to denote the home page, and avoids any ambiguity. > [timMIT] > Server isn't a perfect name eitehr ... tends to be > a computer. Tends to, yes, but one physical computer can host many virtual web servers, with all of those servers domain names mapped to the single server IP. The actual physical server level seems completely opaque to http: URI semantics, rooted in the particular HTTP servers, not the machines hosting those servers. If we want an explicit URI to denote a physical machine, we need something other than an http: URI, *IF* we want that machines identity to be independent of any particular web server identity. E.g. host:example.com denotes the physical machine to which the domain name example.com resolves One could then make statements about that particular machine, such as the owner, location, physical characteristics, etc. > [TBray] > Chris: echoing problem of site/server disconnect, bad > architecture to require everyone to write one file > Chris: if a Site is an important thing, it should > have a URI; > right now there's no such thing > Chris: per our axioms > Roy: When robots.txt was invented.. (Chris: > everyone had their > own server) .. the idea was to knock politely on > some part of a > naming authority's domain > Roy: haven't seen a proposal yet with equivalent semantics Interesting, I thought the MGET proposal was precisely that. 1. Take the knowledge now expressed in a robots.txt file. 2. Express that knowledge as RDF statements about the web site. 3. Expose that knowledge via a "semantic web enabled" server. 4. Do an MGET on the URI of the web site to obtain that knowledge. Seems like a very polite way to knock on a naming authority's door to ask about, well, *anything* within the domain of that naming authority. Not just about crawling preferences. And since one can also describe subsites for tenants of the main server, one can ask specifically about those sites as well, using the same machinery. And the MGET machinery is fully open and, if the server/site owners permit, fully supports each tenant to express their own knowledge about their own individual sites and content. So the owner of http://example.com/ can state the site-global preferences, which may very well permit sub-site crawling. And John can state the preferences for http://example.com/~john/ (using the very same vocabulary, no less) which compliment the knowledge expressed about the global site. I.e, it's the kind of solution that I understand Chris wants. But not simply just solving the issue of a more open, generic, standardized way to express robots.txt knowledge about a web site, but an open, generic, standardized way to express knowledge about *ANY* resource whatsoever in the domain of a given web authority. If we're going to change the web architecture, why not "kill a thousand birds with one stone" rather than just one or two birds? I say that MGET and friends represent the stone we need. Cheers, Patrick -- Patrick Stickler, Nokia/Finland, (+358 40) 801 9690, patrick.stickler@nokia.com
Received on Thursday, 27 February 2003 02:55:48 UTC