Re: Talked to the xml.gov people

Bullard, Claude L (Len) wrote:

> Then why does it make a bit of difference what they use
> as the string?

One string has more information than the other. It says: "if you want 
more information about this object and you don't know where to find it, 
use the HTTP protocol and see what you can find out.

It's as simple as that: one string has more information than the other. 
There are some URN syntaxes that embed HTTP URIs and therefore add yet 
more syntax. I think that those are reasonable although I don't think 
they offer much advantage.

> o  URL HTTP because they MIGHT want to dereference it and as
> experience proves, HTTP URLs are always dererefenceable even
> if they return 404.  The policy is global and implemented in
> every browser of interest.

Fair enough. Note that you could also dereference an HTTP URL using a 
catalog or registry. For instance, Google's archive is a nice catalog 
that gives you alternate (historical) representations of HTTP URLs. And 
SGML SOCATs explicitly allow mapping from system identifiers to system 
identifiers.

"The SYSTEM keyword indicates that an entity manager should use the 
associated storage object identifier to locate the replacement text for 
an entity whose external identifier's system identifier is explicitly 
specified by the system identifier."

So it isn't just theoretically possible, it is implemented in nsgmls, 
jade and other SP-based tools.

/tmp/sptest> cat CATALOG
SYSTEM "http://www.w3.org/foo.dtd" "b.ent"

/tmp/sptest> cat test.sgm
<!DOCTYPE foo[
   <!ELEMENT foo - - (#PCDATA)>
   <!ENTITY bar SYSTEM "http://www.w3.org/foo.dtd">
]>
<foo>
&bar;
</foo>
/tmp/sptest> cat b.ent
Len
/tmp/sptest> onsgmls test.sgm
(FOO
-Len\n
)FOO
C

> It's a system trap either way except that the URN gives
> the owner the ultimate choice as to what dereferencing
> mechanism is used and the W3C more or less owns HTTP.

The application processing the data _always_ has the ultimate choice how 
to dereference and can choose NOT to use HTTP, as SP does above. But 
given an opaque URN they do NOT have the choice of ripping it apart to 
find an internet resource they can consult for help. One way gives MORE 
FUNCTIONALITY than the other.

> The rest of us have also watched this sleight of hand
> long enough and we do get it.  It simply comes down
> to the single system ambitions of the W3C and whether
> or not xml.gov buys into that.  If they do, then they
> should use a URL (no, not URN, no not URI, no not IRI)
> and put something at the end of it to keep from
> confusing those who don't get it.  Otherwise, use a
> URN and maintain absolute content independence of
> the system.  Choose one.

The HTTP URI/URL is context dependent and does not require the HTTP 
protocol. I just unplugged my computer from the network and tried the 
trick above and it still worked. It DOES NOT DEPEND on the W3C or HTTP. 
It is just a string of characters and how you interpret it is up to you. 
If you want to interpret it as an index into a catalog, more power to 
you: so does Apache. So does Squid. So do IE and Mozilla (when they are 
looking into its cache, rather than downloading).

This is running code, not theory. IIRC it worked this way from some time 
in the mid-90s.

  Paul Prescod

Received on Thursday, 22 May 2003 17:33:38 UTC