RE: My 2c on scheme abuse from Miles Sabin on 2001-02-06 (uri@w3.org from February 2001)

From: Miles Sabin <MSabin@interx.com>
Date: Tue, 6 Feb 2001 17:43:03 -0000
To: uri@w3.org
Message-ID: <23CF4BF2C499D411907E00508BDC95E116FC07@ntmews_01.interx.com>
Mark Baker wrote,
> Anyhow, I'm finding that I'm repeating myself so it's probably 
> time to stop, at least on the list.

Well, I don't want to appear to be trying to have the last word, 
but ...

> In summary;
> 
> - namespace URLs can be used to do useful things without being 
> resolved (they're locators, but also names)

Agreed.

> - no existing software (AFAIK) needs to resolve a namespace 
> URL

Agreed. *But* it doesn't follow from this that namespace URIs will 
not be resolved excessively.

> - some semantic web apps will need to resolve namespace URLs, 
> but good software will cache the result itself and reuse it 
> until stale.

Agreed. But there are two issues here: Does this scale assuming 
all software is good? And does this scale under more realistic 
assumptions?

I'm not sure on either count. Which leaves ...

> - HTTP proxy chains provide for resolving URLs via caching 
> without ever bothering the origin server

The coordination of proxy chains is no easier a problem to solve 
than, eg., the widespread deployment of the URI Resolution 
Protocol in support of URNs ... in fact my guess is that it's 
considerably harder outside of enviroments with a relatively
homogenous authority.

Wholesale DNS delegation is a blunt instrument, and redirection 
might not be enough. Either might be expensive, hard to maintain, 
or both.

Suppose someone authors an XML vocabulary. She has a domain and
assign's a URI from within it as a bare namespace identifier:
there's no resource to retrieve. There _is_ an http server there
tho', a clapped out old Linux PC on the end of a DSL connection.
Unfortunately (or fortunately, depending on how you look at it)
her vocabulary catches on. But thanks to widely deployed but 
poorly designed client software, attempts to retrieve the non-
existent resource are very frequent and the server falls over
even tho' it's only attempting to deliver 404s, and the client
apps start hanging waiting for responses that'll never come.
Everyone gets upset.

Now, what could the author have done to prevent this situation 
from arising? Alternatively what could she do to fix things once 
it _has_ arisen? Very little, I think, without considerable 
unnecessary expense and complexity. That might be acceptable for 
governments, large corporations, standards bodies, or other 
institutions. But it's not viable for little people ... and we do 
still have some idea of the web as being a medium with a very low 
barrier to entry, don't we?

You might say that this is an issue for any web-accessible
resource, so it's not a particular problem for DTD, Schema or 
namespace URIs ... it's just a special case of the general 
problems of web-scalability. But it's not quite that simple. All 
three could be used in a way far more pervasive than typical web 
content. So, combined with dodgy client software, sites hosting 
them could find themselves more or less _continuously_ 
slashdotted.

In the case of a URI which is only intended to be a bare 
intentifier this would be particularly obnoxious.

Cheers,


Miles

-- 
Miles Sabin                               InterX
Internet Systems Architect                5/6 Glenthorne Mews
+44 (0)20 8817 4030                       London, W6 0LJ, England
msabin@interx.com                         http://www.interx.com/
Received on Tuesday, 6 February 2001 12:43:38 UTC