Re: Client-side-resolved Indirection

At 09:46 AM 12/2/96 CST, Michael Sperberg-McQueen wrote:
>I'm not sure I have understood you both correctly -- if I have, you
>are arguing in favor of adding PUBLIC identifiers without specifying
>how they are to be resolved.  I can't believe my eyes.

Well, I'm glad someone came out in favour of the other side, because I
thought there might be people who felt that way, but were not saying
something, or that perhaps I was not being clear in my proposal.

Believe your eyes, Michael.

>True, adding the syntax without describing the semantics would only add
>a few lines to the specification.  It seems to me, however, that it
>would blow XML out of the water and more or less completely eliminate
>the chances that independent implementations would inter-operate except
>by accident.

Or by design. People other than this WG *can* design systems.

>I think this boils down to having a paragraph in the spec which says
>(possibly in different words) "PUBLIC identifiers are magic cookies,
>which are resolved using a Magic Cookie Resolution Algorithm which lies
>outside the scope of this specification.  As a consequence, no two XML
>processors are likely to resolve them in the same way; implementors are
>encouraged to distinguish their processors by the ingenuity of their
>resolution methods, using SGML Open catalogs, clever directory path
>tricks, ad hoc configuration files, magic flashing lights and sirens, or
>other mechanisms."

Right! You've hit the nail on the head in a couple of respects. First,
document *authors* should not care about the resolution mechanism of
document *consumers*. They should be totally independant.

When you download an SGML document from the web, and download a catalog with
it, you have, in my opinion, basically abused the catalog feature. It is a
*useful* abuse, and one that is perhaps unavoidable in SGML, but not
necessarily in XML.

Think of it this way: if I have a public identifier -//PAUL in the same
downloadable package as a catalog that says that -//PAUL =
"http://www.paul.com", I have basically *hard coded* "http://www.paul.com"
into the package (though not the document). In other words, I have required
my SGML vendor to implement the whole SGML catalog so that I could get one
lousy level of indirection. As Lee has pointed out, "big deal!" If XML is
designed right I should be able to get that same level of indirection
through entities, through hytime, or through URL redirection. Plus there are
all the system level redirection capabilities (where available). If you know
C++, think of downloadable catalogs as a pointer to a pointer...useful, but
not very exciting.

But PublicIDs are different. They are *names* for things. The C++ equivalent
might be a "resource name", let's say "lpt1:". Now when you code your C++
program you don't *care* how lpt1: is mapped to a physical device, as long
as it is done correctly. The mechanisms for doing this on DOS, Windows, OS/2
and Windows NT are going to be *very* different. But all that is important
to the programmer is the name. (yes, Unix uses a different name for this,
but different names for the same thing is a sociological problem outiside of
the realm of any namespace mechanism)

>We tried that approach once before, in 8879.  It left all the details
>open to the implementors.  The implementors exercised this freedom for a
>while, but ultimately chose (driven by their users? or by the odd desire
>to make 8879 live up to its billing?) to abandon it and specify what
>8879 left unspecified.  Now we have the SGML Open catalog, and PUBLIC
>identifiers work.

First, 8879 demonstrably succeeded. People used the open facility, decided
what features they wanted and standardized. If it had forced a facility from
the beginning, SGML's implementation might have been slowed and the facility
chosen might have been suboptimal.

Second, SGML Open standardization was primarily important because different
applications on the same *system* were using different catalog files. That
meant that every tool had to be configured individually, which was a pain.
But what if I want to make my organization's FPI to SYSId lookup mechanism a
relational database query? I don't see how SGML Open catalogs can be easily
used to implement that. SGML was flexible enough to allow SGML Open catalogs
to exist without restricting us to them. So 8879 was correct to leave the
resolving mechanism open. When URNs are finished, they will obviously be
implemented as a distributed database lookup, not as a flat SGML Open text
file, so it would be even more deadly to standardize on a text file format
now than it would have been in 1986.

Finally, Public Identifers were useful before SGML Open Catalog as I will
demonstrate in a minute.

>To answer the question someone posed:  No, I, for one, did not use SGML
>public identifiers before the advent of the SGML Open catalogue, 

Didn't you ever download documents that used these FPIs in it?

"ISO 8879-1986//ENTITIES Added Latin 1//EN"
"ISO 8879-1986//ENTITIES Greek Symbols//EN"
"ISO 8879-1986//ENTITIES Publishing//EN"

I must admit that I had really expected TEILite (at least!) to have a public
identifier, but you got me there, it doesn't. I guess everybody had better
have their TEILite file named "teilite.dtd". I encourage you to create one.

>no two systems I had resolved them in the same way; 
>with system identifiers, at least, my troubles were limited to the times when I
>moved documents from one system to another.  With public identifiers and
>no common resolution mechanism, I saw no payback at all for the effort
>of learning a different ad hoc odd hack solution for each parser, and
>strewing copies of my DTDs all over my hard disk.  Not all operating
>systems have symbolic links ...

So the problem is with the multiple systems on your machine, not your
machine vs. someone else's machine. Since SGML Open catalogs are available,
I would expect XML processors to use it. But I am *not* prepared to limit
them to it, just as SGML did not. Thanks to that flexibility, I *can* create
a conforming SGML FPI-lookup system based on a relational database, or file
system hack, or grep.

>But one thing seems clear to me:  if we have them, we
>need to specify how to handle them.  If we don't, we are giving up
>without reason on the goal of interoperability and complete explicit
>definition of the language.

Public identifier lookup is a semantic. We have the option of specifying it
or not. I hope I have shown that tying the semantic to a particular
resolution mechanism at this stage of the game would be wrong. But the good
side is that you don't *really* have to give up anything. If you require all
of the vendors of the products that you use to use SGML Open catalogs, or a
subset, then you will have all of the benefits you currently accrue, without
interfering with my ability to require *my* vendors to talk to my relational
database, or my "Handle" URN server or ...

If URNs weren't around the corner, I would suggest that SGML Open should
standardize an API for communication between entity managers and FPI
managers (such as SGML Open catalogs or SQL databases). But URNs will solve
that problem...

 Paul Prescod