Re: http URIs as names and scalability from Miles Sabin on 2002-10-13 (www-tag@w3.org from October 2002)

From: Miles Sabin <miles@milessabin.com>
Date: Sun, 13 Oct 2002 10:11:53 +0100
To: www-tag@w3.org
Message-Id: <200210131011.54102.miles@milessabin.com>

Larry Masinter wrote,
> It would be very bad if the web architecture REQUIRED or even
> ENCOURAGED implementors of widely used media types to actually go off
> and GET the namespace URI. So a web design that had browsers actually
> trying to connect to www.w3.org and "GET /1999/xhtml" whenever they
> tried to open an XHTML document ... well, that would be a bad design.

I've raised this issue several times over the last few years, tho' wrt 
DTD external subset system ids rather than namespace URIs, and the 
response has always seemed to be either that the problem is with poor 
implementations or that caching/catalogs and content distribution are 
the answer.

The problem here is that the poor (or maybe malicious) implementation is 
at the client end, so not under the control of the URI publisher. It 
certainly used to be the case that many off-the-shelf XML parsers would 
by default attempt to retrieve external subsets when validating, and 
many developers were unware of the need to change that default. We had 
an illustration of the consequences a while back when Netscape "lost" 
the RSS DTD (see http://www.oreillynet.com/cs/weblog/view/wlg/263) and 
lots of peoples feeds stopped working. Whilst this was an 
administrative mistake, from the POV of the clients, poorly implemented 
or not, it was indistiguishable from a server failure or denial of 
service.

Even more ridiculous would be a use of XML for locally stored 
application configuration information where each read implictly 
involved network access thanks to an attempt to retrive DTD or 
namespace information. Aside from making the application unusable on 
disconnected machines, we could easily imagine the users of an 
increasingly popular product eventually slashdotting the vendor. And 
there's also a privacy issue: each retrieval could be construed as the 
application "phoning home".

So I agree with Larry, there definitely is an architectural issue here. 
It's one thing to say that URIs SHOULD be retriev*able* it's quite 
another to say that they SHOULD be retriev*ed*. That said, as has been 
pointed out more than once, anything which is retrievable probably will 
be retrieved: whether or not it's retrieved often enough to cause 
problems is pretty much indeterminate.

Cheers,


Miles

Received on Sunday, 13 October 2002 05:12:30 UTC