Re: RULE vs MGET

Patrick Stickler wrote:
>>> (2) it violates the principle of URI opacity
>>
>>
>> Is this a real-world problem? robots.txt violates the principal of
>> URI opacity, but still adds lots of value to the web.
> 
> And it is frequently faulted, and alternatives actively discussed.
> 
> In fact, now that you mention it, I see URIQA as an ideal replacement
> for robots.txt in that one can request a description of the root
> web authority base URI, e.g. 'http://example.com' and recieve a
> description of that site, which can define crawler policies in
> terms of RDF in a much more effective manner.

That would carry over one of the reasons why we need a replacement for 
robots.txt: that its notion of 'web site' is bad. If somebody maintains 
a website for some project at http://someuniversity/~name/projectname/, 
that site should be able to have e.g. robot exclusion information 
without convincing the university's web server admins or purchasing a 
domain name. See

     http://www.tbray.org/ongoing/When/200x/2004/01/08/WebSite36

The above proposes a Website: header containing an RDF URI. With URIQA, 
you could do an MGET on a page to discover its site, then do an MGET on 
that URI to find out about its robots policy. But doing an MGET on the 
root URI of the domain would be really flawed.

- Benja

Received on Wednesday, 10 March 2004 06:52:41 UTC