Re: The case against URNs from Paul Prescod on 2002-10-07 (www-tag@w3.org from October 2002)

From: Paul Prescod <paul@prescod.net>
Date: Sun, 06 Oct 2002 17:39:04 -0700
To: "Champion, Mike" <Mike.Champion@SoftwareAG-USA.com>, www-tag@w3.org
Message-ID: <3DA0D7A8.2080604@prescod.net>
Champion, Mike wrote:
> 
>...
> 
> Still, I guess my approach would be to say that there can be all sorts of
> knowledge bases that refer to now://www.example.com/MyCar and they could
> describe what type of thing it is, its current state, etc. 

Sure, but given *only* the URI, how do I find them? I think that a big 
part of the Web's "loose coupling" derives from the fact that given a 
URI I can immediately learn about it by dereferencing it. That's why GET 
must be idempotent, so I can do this. (aside: this is why the Web 
doesn't need UDDI: the Web is all about universal description and discovery)

> ... Or, for any
> non-trivial abstract resource, they could describe various assertions, from
> various people with various degrees of authority, about what it is or what
> state it it's in.  (Think of now://www.whitehouse.gov/ ... the owner of that
> site's assertion about the state of the thing it refers to is far, far less
> interesting than other people's assertions about the resource.  That's why
> the Google pays no attention to META tags, no?).

Google can only answer questions about the subset of the Web that it has 
indexed. Surely Google wouldn't work BETTER if META tags were 
disallowed. It can believe them or ignore them at its own discretion.

Plus, I see your Google and raise you Meerkat. Meerkat believes what 
sites say about themselves and seems to work fine. It clearly depends 
greatly on the domain. Sometimes there is no benefit in lying.

> If the owner of a site does want to put metadata describing the state of the
> abstraction it represents, it can always put it at
> http://www.whitehouse.gov/index.htm (or index.rdf or index.rddl ...).  The
> now:// scheme at least makes explicit that there's a difference between
> www.whitehouse.gov the website and www.whitehouse.gov the Presidency (or the
> residence at 1600 Pennsylvania avenue, or whatever).  

What you're saying is that "now" URIs are derferefencable (see my 
just-sent correction). So IMHO, it is just syntactic sugar as the 
trailing "#" is. If you acknowledge there is a problem (as Fielding does 
not) then either solution is fine.

>...
> My assertion -- which I don't hold all that strongly, but would have to be
> persuaded to drop -- is that the "name" http://www.whitehouse.gov is
> fundamentally ambiguous -- it may be the site, or some abstraction -- and
> hence automated inferencing systems will always get confused easily when
> confronted with it.  

All names start out ambiguous. "http://www.prescod.net/foo" could refer 
to anything whatsoever. Sure, Tim B-L claims he knows it refers to a 
document, but a document about what?

A name becomes precise through assertions. What if we required EVERY URI 
in the world to assert its class through rdf:type. Then the white house 
site either declares "rdf:type='politicalDocument'" or 
"rdf:type='politicalOrganization". It can also assert something like:
rdf:describingDocument="./index.html".

I'm not suggesting that we really require the whole Web to do this...but 
people who want their URIs to be used unambiguously could.

> ... I suppose that
> could be resolved by fiat ... but how are you going to get people to follow
> it?  It seems better to start with something URI scheme that has not been
> registered (now:// ???) so that any application that has no idea what
> now://www.whitehouse.gov is all about will not attempt to do anything with
> it,

A program either knows what the URI means (perhaps by its surrounding 
context) and thus is not tempted to dereference it, even if it is 
"http://" or it does not know, in which case dereferencing it is the 
right thing to do.

>...
> For that matter, I don't think I would care if some small set of methods
> *did* apply to now:// URIs, so long as those were appropriately specific to
> the abstractess of the URI, i.e. GETing now://www.whitehouse.gov was defined
> as retrieving the owner of the namespace's assertions about itself or
> something like that.  

If you follow some links from that page you will find some asertions 
about the type and state of that resource! "Three branches of 
government" etc.

It isn't machine readable but that's what "Accept: xml/rdf" is for.

> ... But since the results of a GET on
> http://www.whitehouse.gov are already understood by hundreds of millions of
> pieces of software and millions of web developers, it seems too late to say
> that it SHOULD be a way of describing the type, state, etc. of that abstract
> resource. 

Third parties ALREADY use it that way. "The <a 
href="http://www.whitehouse.gov">DAMN FOOLS in WASHINGTON</a> need a 
kick in the ass".

> Anyway, I hesitate to get into this ... as you say, it's well-trodden
> ground, Dr. Fielding has probably devoted thousands of times more thought to
> it than I have.  I just observe that for whatever reason -- possibly the
> profound denseness of the non-enlightened -- this "axiom" does not seem to
> be fitting cleanly into a realistic description of the Web as it actually
> exists, or (as near as I can tell from following the debate from a distance)
> TimBL's vision of the Semantic Web either.  Not having an intellectual or
> emotional stake in the argument, this leads me to suggest "drop the axiom"
> (not sure exactly which!) and see if the rest of the system falls into place
> more cleanly. 

Roy claims somewhat pursasively that the Web as it exists today DOES 
allow addressing of abstract resources. Let's put it this way, would you 
think it odd if you saw the following markup:

<p>The <a href="http://www.w3.org">W3C</a> is an important <a 
href="http://www.dictionary.com/search?q=organization&r=67">organization</a> 
and <a href="http://www.google.com">Google</a> is an important service. 
</p>

I think that the referents are concepts, not documents. I agree that it 
is equally likely that someone will use those URIs to refer to 
documents. That's why an rdf:type could disambiguate.

So I see Roy's view as being consistent and complete. I don't see any 
harm with Tim B-L's view that the type of all resources addressed with 
URIs without "#" are implicitly "document" (where document is defined to 
include "service") but I'm not convinced it solves anything. URIs are 
likely to be used ambiguously unless they are defined unambiguously by 
their creators in the first place. (so "http://www.whitehouse.gov/" is 
probably way too ambiguous to be used for machine reasoning).

  Paul Prescod
Received on Sunday, 6 October 2002 20:39:41 UTC