- From: Benja Fallenstein <b.fallenstein@gmx.de>
- Date: Wed, 10 Mar 2004 13:52:00 +0200
- To: Patrick Stickler <patrick.stickler@nokia.com>
- Cc: ext Phil Dawes <pdawes@users.sourceforge.net>, www-rdf-interest@w3.org
Patrick Stickler wrote: >>> (2) it violates the principle of URI opacity >> >> >> Is this a real-world problem? robots.txt violates the principal of >> URI opacity, but still adds lots of value to the web. > > And it is frequently faulted, and alternatives actively discussed. > > In fact, now that you mention it, I see URIQA as an ideal replacement > for robots.txt in that one can request a description of the root > web authority base URI, e.g. 'http://example.com' and recieve a > description of that site, which can define crawler policies in > terms of RDF in a much more effective manner. That would carry over one of the reasons why we need a replacement for robots.txt: that its notion of 'web site' is bad. If somebody maintains a website for some project at http://someuniversity/~name/projectname/, that site should be able to have e.g. robot exclusion information without convincing the university's web server admins or purchasing a domain name. See http://www.tbray.org/ongoing/When/200x/2004/01/08/WebSite36 The above proposes a Website: header containing an RDF URI. With URIQA, you could do an MGET on a page to discover its site, then do an MGET on that URI to find out about its robots policy. But doing an MGET on the root URI of the domain would be really flawed. - Benja
Received on Wednesday, 10 March 2004 06:52:41 UTC