W3C home > Mailing lists > Public > www-html@w3.org > February 2000

Re: URL better than FPI

From: Arjun Ray <aray@q2.net>
Date: Sun, 20 Feb 2000 10:54:13 -0500 (EST)
To: W3C HTML <www-html@w3.org>
Message-ID: <Pine.LNX.4.10.10002200843480.1475-100000@mail.q2.net>


On Fri, 18 Feb 2000, Dan Connolly wrote:
> Arjun Ray wrote:
 
> > In terms of the great Undead Debate - Names vs Addresses - PUBLIC
> > is basically a "name" and SYSTEM is basically an "address". 

ISO 8879 doesn't really explain PUBLIC and SYSTEM.  That was left to
the _SGML Handbook_, I suppose:)  The discussion of 10.1.6 "External
Identifier" (p.377 ff) has this:

: A _system identifier_ is system-specific information that enables
: the entity manager of an SGML system to locate the file or the 
: memory location or the pointer within a file where the entity can be
: found. [...]  It should also be noted that a system identifier could
: be an invocation of a program that controls access to an entity that
: is being identified. [...]
: A _public identifier_ is a name that is intended to be meaningful 
: across systems and different user environments. [...]

Taking this as descriptions of canonical semantics, I think the case
for "Cool URLs" being PUBLIC identifiers is pretty clear cut.  If we
think that there's already too much definitional baggage in or usage
overloading of PUBLIC and SYSTEM, we might be better off with a new
name altogether, say EXTERN, but I'd still maintain that EXTERN would
be more PUBLIC than SYSTEM:)  [Another limited analogy: it seems that
#include <stdio.h> is kinda PUBLIC-ish, while #include "foo.h" is
kinda SYSTEM-ish; both forms are useful, but never simultaneously for
the same entity; which would one want for a clean-compile-anywhere out
of the box?:)]

> The only issue is the constraints on the syntax of what goes
> inside PUBLIC "..." vs. the unconstrained SYSTEM "..." syntax. I
> completely agree that we only need one and that it's awkward that
> XML 1.0 picked the SYSTEM "..." as the required slot, but (if I
> understand correctly) that was the only slot where URIs fit,

Yes.  The Great Triage had a very tight deadline (Nov 1), so rather
than rethink the whole external identifier mess, everything got
included in the maximally expanded form.  There was no inherent reason
to rule out the SGML practice of "implied SYSTEM" (the *superior* and
perhaps only proper use of the SYSTEM keyword, IMHO), but if it was a
question of accommodating URLs at short notice, the system literal
tagging along mandatorily seemed convenient.  (I'm pretty sure that a
lot of discussion was suppressed voluntarily because it was believed
that there would be a 1.1 spec in relatively short order.  Since that
never materialized, we're stuck with a number of decisions that, had a
more generous time limit been allowed, might just have come out quite
differently.)

> and I'd rather have a misnomer in the syntax than trade in
> the benefits of the widely deployed Uniform Identifier syntax.

It's a tough call.

The point, I think, is not to lose sight of the goal.  Right now, we
have a how-to-get-there-from-here problem, but bagging PUBLIC to keep
SYSTEM is not the way to go: if anything, SGML interoperability would
say that a proper catalog mechanism obviates SYSTEM (except perhaps as
a minimization), so that's the one that should be obsoleted over the
longer haul.

I think the syntax deployment issue has a large element of perception
involved.  That is, the compactness is a convenience, but the syntax
per se isn't, necessarily.  Most of the time, the software already
"knows" that it's looking at a URI and proceeds to use it; we're not
at the stage where URIs can be plucked from undifferentiated streams
of text.  In fact, whether a candidate string *could* be a URI is a
very non-trivial exercise, which doesn't rate to get any easier over
time:

  http://www.deja.com/=dnc/getdoc.xp?AN=513160002
  http://www.deja.com/=dnc/getdoc.xp?AN=513219055

So, I'd say that deployment of the syntax "as a whole" is much more
fragmented than many of us would like to believe.  In that sense, it's
important (IMHO) not to fall into the KTWSFN [1] mentality which says
that the old way of grokking has gotta be the only way to grok.  On
the contrary, since URIs *do* have internal syntax, isn't it kind of
strange to inhibit or prohibit a *syntactic formalism* like SGML/XML
if it could be convenient?
  
[1] Keeping The Web Safe For Netploder

> > A win/win:)
> 
> Hmm... well... it doesn't seem that the "+//IDN ..." syntax
> accomodates URIs that aren't DNS-based; e.g. uuid:23io423oi423oi4
> or oid:12.424.54.34.23.23.45.24 or even mid:l2k3j42lkj3@foo.com
> or tel:+1-444-555-1212 or futurescheme:whatever-goes-here .

  (a) No surprise: IDN is "internet domain name":)
  (b) This is an "issue" in only one situation: the *minimized*
      external identifier of a doctype declaration.  (e.g. is
      the 'file:' scheme really entitled to the 'U' in 'URI'?)

> The "http:" in "http://www.w3.org/" serves not only as a clue
> to what network protocol to try, but also to dispatch between
> completely orthogonal naming schemes.
> 
> In a way, the registry of URI schemes (http:, ftp:, ...) is
> analagous to the registry of registered owner identifiers (+//xyz).

Very reasonable, but that may not be the proper way to look at it,
SGML/XML-wise.  URI schemes are actually *notations*!  That is, it may
*not* be advantageous to "forget" that entities can have notations
attached to them.  (This is actually an exploitation of the normal
"data content" semantic, when it's known that the app will be handling
it anyway.)  It's similar in spirit to the convenience offered by many
browsers, where you don't have to type in the 'http://' of a URL if
the software already "knows what you mean".  Unfortunately, a full
exploitation of the idea needs data attributes, which XML doesn't
have.  But there's probably room for a specialized XML app dealing
with URI schemes specifically (if only to reintroduce ideas that got
left out when FSIs bit the dust, such as local character encodings and
the like.)

> [...] Hmm... in practice, I wonder if it would be easier to get
> ISO to relax the syntactic constraings on PUBLIC identifiers than
> to deploy a convention of mapping +//IDN foo.org/... to
> http://foo.org/... or ftp://foo.org/... or whatever.

ISO9070 becomes a constraint only in relation to FORMAL YES.  Why
couldn't YES be expanded to more than one convention?  (An issue for
WG4, I suppose.)

> Probably ISO isn't the rate-limiting-factor; probably, the
> consequences of putting PUBLIC "http://..." into deployed SGML
> systems is the deployment constraint. But I haven't done much
> testing.

The only problem I see offhand is that catalog support these days can
also assume partial matches of PUBLIC ids.  If PUBLIC ids were kept
opaque upto the application level, there shouldn't be an issue at all.
(*Really* technically, FORMAL NO should warn off assumptions, but I'm
not confident about this either.)


Arjun
Received on Sunday, 20 February 2000 10:28:37 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 March 2012 18:15:42 GMT