RE: Talked to the xml.gov people from Bullard, Claude L (Len) on 2003-05-22 (www-tag@w3.org from May 2003)

From: Bullard, Claude L (Len) <clbullar@ingr.com>
Date: Thu, 22 May 2003 16:54:26 -0500
To: "'Paul Prescod'" <paul@prescod.net>, WWW-Tag <www-tag@w3.org>
Message-ID: <15725CF6AFE2F34DB8A5B4770B7334EE022DC35B@hq1.pcmail.ingr.com>
Then maybe the TAG should be talking about getting rid of 
URNs altogether, or explaining that HTTP really is meaningless 
to systems that provide PUBLIC to SYSTEM catalog mapping?

PUBLIC identifiers were about systems that assigned names 
but said nothing about resolution, eg, identity is assigned. 
That is the semantic.

SYSTEM identifiers were about system specific locations of 
entities.  Resolving an address is the semantic.

HTTP identifiers are about names that identify locations 
of entities.  They are a SYSTEM id.  If one wants to use 
them as a name, they have two semantics.  Fine.

HTTP is a protocol identifier.  Saying it is a meaningless 
string until it gets handed to an HTTP handler doen't add 
much to clarify the situation.   It just means the semantic 
to be implemented is in the handler and is fuzzy in the spec 
because the namespace specification fuzz'd it.

In essence, it makes no difference what goes in that namespace 
id value as long as it is unique within scope.   So why did 
Tim bother to go to xml.gov, and what of value or clarification 
did he tell them since there is no reason to prefer any string 
over another in there given a policy for mapping it to a handler, 
the semantic of which is indeterminate for the purpose of it 
being a namespace identifier?

How can one spec generate so much nonsense?

len

-----Original Message-----
From: Paul Prescod [mailto:paul@prescod.net]
Sent: Thursday, May 22, 2003 4:34 PM
To: Bullard, Claude L (Len); WWW-Tag
Subject: Re: Talked to the xml.gov people


Bullard, Claude L (Len) wrote:

> Then why does it make a bit of difference what they use
> as the string?

One string has more information than the other. It says: "if you want 
more information about this object and you don't know where to find it, 
use the HTTP protocol and see what you can find out.

It's as simple as that: one string has more information than the other. 
There are some URN syntaxes that embed HTTP URIs and therefore add yet 
more syntax. I think that those are reasonable although I don't think 
they offer much advantage.

> o  URL HTTP because they MIGHT want to dereference it and as
> experience proves, HTTP URLs are always dererefenceable even
> if they return 404.  The policy is global and implemented in
> every browser of interest.

Fair enough. Note that you could also dereference an HTTP URL using a 
catalog or registry. For instance, Google's archive is a nice catalog 
that gives you alternate (historical) representations of HTTP URLs. And 
SGML SOCATs explicitly allow mapping from system identifiers to system 
identifiers.

"The SYSTEM keyword indicates that an entity manager should use the 
associated storage object identifier to locate the replacement text for 
an entity whose external identifier's system identifier is explicitly 
specified by the system identifier."

So it isn't just theoretically possible, it is implemented in nsgmls, 
jade and other SP-based tools.

/tmp/sptest> cat CATALOG
SYSTEM "http://www.w3.org/foo.dtd" "b.ent"

/tmp/sptest> cat test.sgm
<!DOCTYPE foo[
   <!ELEMENT foo - - (#PCDATA)>
   <!ENTITY bar SYSTEM "http://www.w3.org/foo.dtd">
]>
<foo>
&bar;
</foo>
/tmp/sptest> cat b.ent
Len
/tmp/sptest> onsgmls test.sgm
(FOO
-Len\n
)FOO
C

> It's a system trap either way except that the URN gives
> the owner the ultimate choice as to what dereferencing
> mechanism is used and the W3C more or less owns HTTP.

The application processing the data _always_ has the ultimate choice how 
to dereference and can choose NOT to use HTTP, as SP does above. But 
given an opaque URN they do NOT have the choice of ripping it apart to 
find an internet resource they can consult for help. One way gives MORE 
FUNCTIONALITY than the other.

> The rest of us have also watched this sleight of hand
> long enough and we do get it.  It simply comes down
> to the single system ambitions of the W3C and whether
> or not xml.gov buys into that.  If they do, then they
> should use a URL (no, not URN, no not URI, no not IRI)
> and put something at the end of it to keep from
> confusing those who don't get it.  Otherwise, use a
> URN and maintain absolute content independence of
> the system.  Choose one.

The HTTP URI/URL is context dependent and does not require the HTTP 
protocol. I just unplugged my computer from the network and tried the 
trick above and it still worked. It DOES NOT DEPEND on the W3C or HTTP. 
It is just a string of characters and how you interpret it is up to you. 
If you want to interpret it as an index into a catalog, more power to 
you: so does Apache. So does Squid. So do IE and Mozilla (when they are 
looking into its cache, rather than downloading).

This is running code, not theory. IIRC it worked this way from some time 
in the mid-90s.

  Paul Prescod
Received on Thursday, 22 May 2003 17:54:34 UTC