URL better than FPI

I'm a little surprised to find myself saying this, but after thinking
about it for a while, I've come to the conclusion that a URL identifier
for XHTML is as good as, and probably slightly better than an FPI
identifier.

I was pretty convinced that the URL, was just a location,  Such and such a
file on a particular machine, retrieved by a particular protocol, whereas we
want an identifier that says, this the the XHTML DTD, which is
independent of protocol, and machine.

But lets look at the URL more carefully

http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd

It has (more or less) 3 parts: ``http'' ``www.w3.org`` and
``TR/xhtml-basic/xhtml-basic10.dtd'', plus some separators for parsing.

A more logical order is

(1)www.w3.org
(2)http:
(3)TR/xhtml-basic/xhtml-basic10.dtd

because first you connect to a machine, then use a the protocol with the
file.

Lets look at each part, starting with (3).  TR/xhtml-basic/xhtml-basic10.dtd
is just a name.  It is a key that is used in a table to retrieve a file.
The separators are [1]almost meaningless, and if the server wanted the
separators could be : instead of /, or have no structure whatsoever.

(2) is the protocol.  But because URL's are uniform, the meaning of (2)
is irrelevant to us.  It might as well be quix: or I:.  The protocol is
only important to programmers.  As a programmer, I found it hard to ignore
the fact that this is a protocol, but once you do, you realize it is like
part of the name.

Consider a DOS-like file system.  If you have a CD-ROM on say I:, then I
have to use a different protocol to access the CD-ROM than you would use
to access a floppy drive on A:.  But as a user, you don't care and it is
really transparent to you.  It is just a key that returns a table, then
you send another key to (the file name) to get the document.  But you can
just pretend that the protocol + the file name is one big key for one big
table on the machine (1) www.w3.org

This brings us to (1), the most important part.  The part that seems to
really make URL's a location an not a document name.  But the reality is
there is no machine named www.w3.org.  There are machines named
slow1.w3.org, and slow2.w3.org.  These are the machines that actually
serve the table that maps (2) + (3) to a documents.

So www.w3.org isn't a machine, it is a virtual machine name.  So it is a
name that always maps to a machine that is guaranteed to use the table
generated by (2) + (3) to map keys to documents.

An important point is that the W3C owns all names in (1) of the form
...w3.org.

Now lets look at 

-//W3C//DTD XHTML Basic 1.0//EN

It has 3 parts too:
(1) -//W3C
(2) DTD XHTML Basic 1.0
(3) EN

Plus separators for parsing.  (Actually 1&2 have 2 parts to them.)

(3) just means the document is the English.  We can consider it part of
the name (2).  (2) is just a key in table.  DTD just means that it is a
DTD, XHTML Basics 1.0 is the name of the DTD, and EN means the comments
are in English.

But (2) + (3) is the same as TR/xhtml-basic/xhtml-basic10.dtd.  The .dtd
says it's a DTD, lots of use of xhtml-basic.  There is not mark saying it
is in English.  One could use use .en.dtd as an extension, but since the
document will only be English, thus omitting it in the URL doesn't matter. 

www.w3.org corresponds to -//W3C.  Both indicate the the following name
(TR/xhtml-basic/xhtml-basic10.dtd or DTD XHTML Basic 1.0) is to be
interpreted as a key for the a table controlled by the W3C.  But the W3C
doesn't actually doesn't own -//W3C like it owns www.w3.org, and anyone
can make a document with the FPI -//W3C.  So really URL's a better in
this respect.

So surprisingly, the URL is actually independent of machine name (because
of virtual machine names) and independent of protocol (because of
uniformity).

Note this argument applies specifically to the W3C, because the own
www.w3.org.  I personal don't own a domain name, so I'd be better of
using FPI's (-//Russell O'Connor//DTD ...) for my DTDs.

[1] The /'s are important because /foo/./../bar is equivalent to /bar, so
the keys are really strings modulo this equivalence relation.

-- 
Russell O'Connor                           roconnor@uwaterloo.ca
       <http://www.undergrad.math.uwaterloo.ca/~roconnor/>
``Paradoxically, a refusal to `put a monetary value on life' means that
life is often undervalued.'' -- Artificial Intelligence: A Modern Approach

Received on Friday, 18 February 2000 11:51:13 UTC