Re: draft proposal for catalog resolution [market distinction] from lee@sq.com on 1997-03-31 (w3c-sgml-wg@w3.org from March 1997)

From: <lee@sq.com>
Date: Mon, 31 Mar 97 01:56:07 EST
To: w3c-sgml-wg@w3.org
Message-Id: <9703310656.AA21891@sqrex.sq.com>
I've heard the following sentiment several times:

> Let this be an area for market differentation. As long as the processor
> faithfully tries to follow the instructions from the author, in the
> remote catalog, its mechanisms for finding "better" instructions should
> be unconstrained. It could be a remote catalog, a URN resolution service
> or annoying popup dialog box. Let the market decide.

I can't imagine many people choosing which XML product to buy based
chiefly on the way it resolves PUBLIC identifiers.

We don't often get asked how Panorama does this when people are choosing
between Panorama Pro and DynaText, that's for sure.  People work out
which is cheaper, or which will work in their own environment, or
which can be customised more easily to interoperate with some particular
document management system.


No, Paul and others, the market won't decide PUBLIC for us.

Author/Editor and Arbortext's Adept Editor may have similar initials,
but they use incompatible PUBLIC resolution methods, and although we'd
like very much to move to CATALOG soon, we haven't seen sales suffer
because of that particular issue.  If we had, we'd have changed
Author/Editor long ago.

No, thta's not what incompatibilities will do.

What they will do is frustrate users.

Once people have bought Author/Editor and RulesBuilder, then their
fun begins, and the sound of corks popping is heard througout the
land, on the grounds than three days of partying is more enjoyable
then understanding how rb.map and extid.map work.  But by then they
are for the most part blaming the complexities of SGML, not us --
and they are looking forward to seeing it fixed with XML.

But if CATLAOG isn't required, an XML A/E would continue to use
extid.map, I expect.  Why change when you're having so much fun??

The reasoning has to be based on what will maximise interoperability.
Our market is not going to prefer one mutually incompatible browser
or editor or whatever over another -- it is going to prefer HTML or
PDF, where these hassles go away.

If we make the mistake of allowing PUBLIC, we have at least to _try_
and ensure that every XML processor can handle every XML file on
the web without human intervention.  That includes no intervention
by system administration.

In non-web environments, different amounts of intervention may be
acceptable.

But let's _try_ to make it work.

Paul's CATALOG proposal as redrafted & posted by Michael is a good step.

If it is accepted as a minumum requirement for all XML processors,
even DTD-less ones, we've probably lost our Dirty Perl Hacker.
If it is optional, we have an optional language feature.
People hoping to put URNs in PUBLIC identifiers will have to check
that it's OK not to have ! @ # % ^ & _ { } [ ] | \ ~ ` ; < > , in
URNs, as they are forbidden in PUBLIC identifiers.  Perhaps SGML
could be changed here, as there doesn't seem any advantage to
restricting the character set, and it's going to look odd to allow
Kanji or Devanagari or accented Latin characters in SYSTEM IDs
such as file names and URLs (URL internationalisation is in progress,
but file: URLs are already OK in practice at least, and you can
escape characters in URLs with %, a character not allowed in a
PUBLIC Id) and have A-Za-z 0-9 and a little punctuation in PUBLIC
identifiers, that are supposed to be more powerful.

Probably we'll need to reserve yet another character for escaping,
since you can't (I think) use &#999; in a PUBLIC identifier.
Well, even if you can, the resulting data character has to be legal
there.  So we could say that ?dddd? would be the escape for an
arbitrary Unicode character.

So + for space in URLs, %dd for other characters in URLs, &#dddd;
in text, and ?dddd? only in PUBLIC identifiers.  Still want it?

Here are five ways of including a DTD fragment:

[1]
    <!DOCTYPE xx % PUBLIC "yy">

[2]
    <!DOCTYPE xx % SYSTEM "how to get yy">

[3]
    <!DOCTYPE xx [
	<!Entity yy % PUBLIC "yy">
	%yy;
    ]>

[4]
    <!DOCTYPE xx [
	<!Entity yy % SYSTEM "how to get yy">
	%yy;
    ]>

[5]
    <!DOCTYPE xx [
	<!Entity catalog % SYSTEM "how to get catalog.xml">
	<!--* catalog.xml defines the yy entity *-->
	%yy;
    ]>

[6]
    <!DOCTYPE xx [
	<!Entity catalog % PUBLIC "catalog.xml">
	<!--* catalog.xml defines the yy entity
	    * but this relies on external PUBLIC resolution to
	    * get our real XML catalog
	    *-->
	%yy;
    ]>

Do we have enough features yet?

Why isn't the catalog file itself in XML?

We now have three languages:

[1] XML, in the body of a document

[2] The baroque SGML DTD syntax

[3] the CATALOG file

Well, with DSSSL another one is coming, I expect.

It's all just too much, especially when the gains of using PUBLIC
seem so small.  With [5] above you get almost all the benefits anyway,
with a lower implementation cost, one fewer langauge, and heck, if
we removed SYSTEM xxx from DOCTYPE, a smaller language!

The missing benefit is that it's harder to do distributed resource
mirroring without standardised names, as per Michael's TEI example.
But we haven't solved that problem, ans if the URN group solves it,
you can put URNs in your SYSTEM identifiers, and you _stil_ don't need
PUBLIC.

Lee
Received on Monday, 31 March 1997 01:56:09 UTC