W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > February 1997

Re: XML catalog draft

From: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>
Date: Mon, 17 Feb 97 09:23:31 CST
Message-Id: <199702171604.LAA09958@www10.w3.org>
To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
On Mon, 10 Feb 1997 20:45:01 -0500 Tim Bray said:
>I think we should:
> - allow a PUBLIC keyword/string as appropriate

I agree, *if* we also require, or at least recommend, a basic resolution

Developers with clever new ideas can always add their resolution code as
extensions, and take the market by storm.  Users should however be able
to tell a system "I don't want your (*%&*$^&) extensions, just do the
standard thing".  Which requires having a standard thing the user can
count on.  Hence my preference for requiring a standard resolution
method.  People with existing working systems that use different
resolution methods (special directory names, sequential search through
the net, the I Ching, sortes Vergilianae, what have you) can continue to
use them.  If these systems do not support the standard resolution
method, they won't interoperate with other XML systems, and so they
shouldn't claim to be XML conformant.  If they want to be XML conformant
they should do some work to ensure interoperability.  If we can't bring
ourselves to require anything at all, preferring to make developers like
us at the expense of users, who will curse the days we were born, at
least we should have the nerve to make a recommendation.)

> - make a decision as to what should be done when both PUBLIC and
> SYSTEM are there


(Answer to Lee:  the system ID is there as a fallback for when I process
the document with parsers that don't support socats, and in some cases
don't support PUBLIC at all.  It is not there to override the PUBLIC

(Answer to David:  good point about complex resolution strategies.
Provide them as a user-selectable option, if you like, but don't
*REQUIRE* XML users to live through the same incompatibility hell that
SGML users currently put up with.  The simple strategy PUBLIC before
SYSTEM works and judicious catalog handling can make it work precisely
as I like, all the time, as long as the system looks for a local catalog
before looking for the catalog that came with the document.  Judicious
catalog handling would be a lot easier if the socat were an SGML
document instance, but that's apparently water under the bridge now.)

> - provide a pointer to TR9401 in the write-up on the PUBLIC
>   identifier, saying that this is one proven-in-practice way to
>   resolve them

I like Paul Prescod's idea (seconded by Jon and others) of publishing
the subcommittee draft (with extensions) as a separate W3C WD on
catalogs and catalog-based resolution.

> - provide support to catalogs by providing a special PI, as
>   recommended by a couple of WG-ers, to help define the BASE and
>   thus find the catalog

Right.  But local catalogs need to come first, or I'll never get my
ICADD-enhanced HTML, I'll always get the naked ICADD-deficient HTML on
people's servers.

> - investigate the problem of what seems like the unnecessary
>   restrictions on MINIMUM LITERAL; I don't think it's legitimate to
>   say that a PUBLIC identifier can't be a URN, which this would do.

Right.  Until and unless WG8 changes the character set for minimum
literals, we can't change the set allowed in PUBLIC identifiers.

I assume that '#' is forbidden in minimum literals because ISO 646
allows pound-sterling to occur at the same location (or did when
8879 was published; I assume it still does but haven't seen the
current 646).  Since in general I think the principle of allowing, in
minimum literals, only what is really safe across systems, this seems
like a good decision by WG8, and a really stooopid one by the
designers of URLs.  Now that we are stuck with trying to achieve
compatibility with this stoopid choice, I think maybe 8879 should
allow #.  Implementors will know, or should know, to be able to
handle pound-sterling as well.


Reading the thread on the catalog draft persuaded me we need several
things I have not yet found on the net:

  - a clear tutorial on URL syntax, explaining the canonical positions
on URL structure, meaning of '#', '?', ';', etc., preferably with
references to the RFCs that actually define the stuff
  - a clear tutorial on the FPI syntax defined by ISO 9070, for
those of use who understand the FPI syntax of 8879 but not 9070.

If anyone can suggest URLs where such tutorials can be found, you'll
have my undying gratitude.  (URLs for clear treatments of MIME and
MIME/SGML would also be gratefully accepted.)

-C. M. Sperberg-McQueen
Received on Monday, 17 February 1997 11:05:00 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:25:07 UTC