Re: XML catalog draft from Paul Prescod on 1997-02-09 (w3c-sgml-wg@w3.org from February 1997)

From: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
Date: Sun, 09 Feb 1997 09:50:32 -0500
To: w3c-sgml-wg@w3.org
Message-ID: <32FDE438.3F3C@csclub.uwaterloo.ca>
lee@sq.com wrote:
> > Similarly, SGML did not specify a special syntax for system identifiers or
> > a resolution mechanism for them. Thanks to that "omission" we can now use URLs
> > in XML. Once again, that was a Good Decision.
> That "omission" has been corrected, and SGML now has FSIs, because of
> problems with SGML systems not being sufficiently interoperable in practice.

I don't think that FSIs meet your criteria of being usable and reliable in
the same way everywhere. They seem to me to be just a syntax for saying: 
"Here's something you may or may not understand. It has notation FOOBAR,
so if you know how to deal with that, you know how to deal with this."

So they are still open-ended in the way I am arguing public identifiers
should be. But I'm certainly happy to be educated if I'm wrong.
 
> I think that again we are at cross-purposes.  I am not proposing mandating
> a single resolution mechanism.  I am saying tht the proposal has to work,
> in the sense that its proponents have to show how you can deliver XML
> on the web (our Purpose) using it.

What I am arguing for, primarily is a separation of baby and bathwater. We
can add a default catalog resolution mechanism. I voted for that. But some
may still vote against it. Jon is voting for a powerful and complex 
delegation mechanism. I voted against that. James wants us to figure out
the interoperation with URNs and FSIs and everything else.

The baby is the simple, opaque PUBLIC string that SGML people use to good
effect today. The bathwater is the resolution argument which may or may
not ever be resolved. If we can suggest a solution to the latter, great!
Let's do it. I'm just arguing that we needent tie the former to the latter.

What about interoperability? Interoperability is primarily what we came here
for, but I don't think it is the *only* thing we came here for. For instance,
we decided to let XML have processing instructions, despite the fact that
we know that they can cause problems. But we decided that in the tug of war
between a) the private interoperability cost and b) the private "usefullness" 
benefit, they were worth keeping. We made the same choice with character sets
(though you can imagine the potential interoperability costs there are huge).
What about the fact that you can use arbitrary URL schemes, even some that have
not been invented? Or link to arbitrary binary objects that some clients
may not know how to render?

In fact, we made the same choice with DTDs. If we wanted to take 
interoperability to the extreme, we could fix the tag set available in XML.
You and I know that multiple tag sets are a MASSIVE source of interoperability
problems. But they are also a massive source of POWER and usefullness, and that's
why we're here.

On the other hand, we went the opposite way on some choices. For instance 
requiring a concrete delimiter set is a little less useful than allowing 
an abstract set, but it is much more interoperable.

Back to the point: the baby is the string "PUBIC foobarbaz". By itself, it
can't hurt anything. Used in conjunction with a system identifier, it can be
skipped with a call to a single C function, and totally ignored. 

If we decide that we cannot work out the interoperability problems today, then
we should not decide that we must remove the syntactic feature that a) is useful
in legacy systems and systems under construction today and b) will allow us to 
build a globally interoperable system tomorrow.

> Yes, there's a legacy problem, you can say -- if I can have a PUBLIC
> identifier in an XML file that is ignored by all standard conforming
> applications, then I can have my own private non-XML system using
> them and that gives me a nice feeling.  (is that a fair summary?)

I would put it stronger than "gives me a nice feeling." I would put it:
"SGML based systems already know how to use them, and use them to good
effect and some people are building large, useful, important systems
based upon them. I suspect that they could also be well used by new 
systems such as HTML editors and website maintainers." And the system 
only becomes "non-XML" if we make them illegal in XML.

> But when XML 2 comes along (say) and PUBLIC identifiers are now
> required to be sock weaving patterns, you're still hosed.  

How could this happen? The semantics of public identifiers are well
understood, whether the resolution mechanism is or not. Was there any
danger of the SOCAT spec. declaring that all public identifiers must
be sock weaving patterns? Anyhow, there is always the danger that a 
future spec. will tromp on your namespace unless the specifiers are
required to avoid that by namespace segmentation or something.

> And if
> someone else's implementation tries to look up PUBLIC IDs before
> SYSTEM IDs, and produces an error message on failure, your files are
> not interoperable.  By not specifying how PUBLIC IDs work, that's
> the sort of problem we'll have.

If the language of the XML spec makes the system identifier primary,
this will not be a problem. I think that it is clear that if we do
not specify a mechanism for public identifier resolution and catalog
resolution, the system identifier should be primary.
 
>Perhaps you should give us some clear,
> concrete examples of how PUBLIC helps interchange XML documents
> over the web, or how there is no circumstance, now or future, in
> which it can hinder such interchange, and what other gain is made.

There are several features in XML that could hinder interchange, including 
the fact that XML is a meta-language, but each must be weighed against 
the potential benefit. The act that XML is a meta-language will certainly
impede interoperability (think of indexing metadata, or conversion to another
file format), but the power offered by that feature is larger. I believe the 
same would be true of PUBLIC identifiers in the event that we cannot agree 
on a default handling mechanism for them.

 Paul Prescod
Received on Sunday, 9 February 1997 09:46:23 UTC