Re: URL better than FPI from Arjun Ray on 2000-02-19 (www-html@w3.org from February 2000)

From: Arjun Ray <aray@q2.net>
Date: Sat, 19 Feb 2000 00:46:29 -0500 (EST)
To: W3C HTML <www-html@w3.org>
Message-ID: <Pine.LNX.4.10.10002182342000.1475-100000@mail.q2.net>
On Fri, 18 Feb 2000, Russell Steven Shawn O'Connor wrote:

> I'm drawing relationships between FPI's and URL to indicate the
> perform more or less the same job.

Sure.  

> They are different, and the analogy are not perfect, but I believe
> they are adequate for the purposes of identifying a document by
> name.

Yes.  The key point is identification by *name*.

> > Have you seen K.4.6 "Internet domain names in public identifiers" of
> > the WebSGML TC?  http://www.ornl.gov/sgml/wg8/document/1955.htm
> > You could have something like this:
> > 
> >    +//IDN w3.org::www//DTD 
> >        XHTML 1.0//EN//http:/TR/xhtml-basic/xhtml-basic10.dtd
>  
> Wow, I think I may have heard of this in passing, but forgotten.  This
> seems really good, maybe we should go with it?

I think we should, but if it doesn't, uh, "explain the thinking behind
the specifications", there will be resistance from the W3C.

> > No.  They're exactly the same.  The real problem is that, under
> > the  current rules, a URI can't be the minimum data following the
> > PUBLIC keyword.  Of course, at root, this is just legalistic
> > mumbo-jumbo, and the SYSTEM keyword is the official *kludge* to
> > get around this  "problem".

> Why is it a problem?  Why should PUBLIC identifiers be used over
> SYSTEM identifiers. ... Um, can you define the semantics of PUBLIC
> and SYSTEM identifiers?  Don't bother if it is too much trouble.

In terms of the great Undead Debate - Names vs Addresses - PUBLIC is
basically a "name" and SYSTEM is basically an "address".  There's also
an understanding that PUBLIC is portable across environments whereas
SYSTEM is always necessarily "local", that PUBLIC ids are effectively
permanent while SYSTEM ids normally *do* vary, and so on.  The common
thread is that a PUBLIC id, as a system-independent name, always has
to be translated to a locally effective address.  From an SGML pov,
that's fine, and in fact all that one needs - the emphasis is on the
maintainability and permanence of one's *document data*.  (Think of
read-only media like CD-ROMs - once you've "frozen" the data, like it
or not, those are just names in there, because addresses can and do
change.)  In a sense, SGML is *all* about names, and *only* about
names:)

The problem was that ISO 8879 forgot to include a standard mechanism
for name resolution.  Had there been such a catalog system from the
start, PUBLIC ids (i.e. names) in documents would have sufficed.
Instead, the whole issue of catalogs was punted in favor of a quick
and dirty kludge, SYSTEM identifiers.  Of course, there were benefits
to this: the more "local" or insulated your system was, the more
convenient it was to use SYSTEM ids directly, saving the "hassle" of
inventing names for non-existent or uselessly indirect catalogs. Well,
so the theory and practice went, except that this tactical shortcut of
using SYSTEM ids as "names" did *not* obviate inevitable changes in
effective addresses.  So by the time the SGMLopen Catalog format cam
about, guess what?  SYSTEM ids needed translation too, to "up-to-date"
SYSTEM ids!  If that isn't a kludge coming home to roost, I don't know
what is:)  [Actually, the usages that survived this stupidity were the
extreme minimizations - in SGML you can omit the actual literal after
the SYSTEM keyword, leaving it to the app to "imply" the effective
value.  The basic reason why this didn't survive into XML is that XML
took a hard line on optional features, so if the SYSTEM keyword had to
be there at all - which it did, as catalogs still weren't standardized
- then the literal had to tag along. Bleagh.]

So, it's not PUBLIC ids but SYSTEM ids that are the real "baggage".
Unfortunately, we're also stuck with the formal variant of PUBLIC ids
(FPIs), which has been a big problem until the WebSGML TC offered an
extension for networks (the +//IDN registered owner class), which now
gives us a way to stick the informational content of a URL into a FPI.
A win/win:)

> I don't see why you say they are the same in the regard. 

In that what we *need* is to put the URL into a PUBLIC id, i.e. that
the literal that follows the PUBLIC keyword should be a URL, or its
informational equivalent.  There is absolutely no reason why we should
not be able to do this.  We lose nothing and get exactly what we want
- so in the sense of performing the same essential function, except
for a bunch of outmoded rules (now thankfully modified) they *are* the
same.  Your argument has been that this should be possible, and I
agree:)

The illusion - and fallacy - is to believe that we need the *SYSTEM*
keyword for URLs.  No, SYSTEM is kludgery and baggage.  With a
suitable definition, PUBLIC works just fine and is all we need.  We
also put the name resolution problem - when and where we face it - in
the proper perspective - translate PUBLIC via a catalog (with no-ops
as convenient) and dispense with SYSTEM ids and SYSTEM->SYSTEM
translations altogether.

I must be dreaming:)

> I can't say that catalogs are never necessary, because I don't believe
> it.  But in this case the URL as as good as (and probably better) than
> the FPI I gave. ... Although your identifier seems better than the URL,
> and I think you may have a point about PUBLIC vs SYSTEM, I need to look
> into that.
>   
> > > So surprisingly, the URL is actually independent of machine name
> > > (because of virtual machine names) and independent of protocol
> > > (because of uniformity).
> > 
> > Please explain this "uniformity" bit.  What happens with ftp?

> I admit this is the most confusing part of my argument.  Consider
> a DOS like system.  [...] Consider drives A: and C:. [...] But
> this difference in protocol is transparent to the user, because
> weather a drive is a floppy, or disk, or mapped network drive,
> doesn't change how the table is access.  This is because each
> media is access (for the users point of view) in exactly the same
> way.  This OS has made the access uniform across media.  In the
> same way access via http: and ftp: are done in exactly the same
> uniform ways in a URL. (a mapping from names to data).

OK, I see now.  The problem, then, is that the path component isn't
invariant across this uniformity (i.e. you can't just substitute 'ftp'
for 'http' in a URL and expect it to be operational)  So do we need to
insist on this 'uniformity' concept?  (Especially given my straw
proposal for what a +//IDN FPI might look like?)


Arjun
Received on Saturday, 19 February 2000 00:21:48 UTC