Re: What to do given both SYSTEM and PUBLIC? from Joe English on 1997-02-20 (w3c-sgml-wg@w3.org from February 1997)

From: Joe English <jenglish@crl.com>
Date: Thu, 20 Feb 1997 11:30:21 -0800
To: w3c-sgml-wg@w3.org
Message-Id: <199702201930.AA19281@mail.crl.com>
paul@arbortext.com (Paul Grosso) wrote:

> > From: Joe English <jenglish@crl.com>
> > (If the software does not know how to automatically resolve
> > PUBLIC ID's -- which I suspect most software will not, since
> > _nobody_ is sure how to do this yet -- . . .
>
> Regardless of what we decide wrt catalogs, Public IDs, and XML,
> I don't understand Joe's comments above that "nobody knows how
> to [resolve PUBLIC IDs] at all yet."


What I meant was: it's not possible to write a program that will,
given an arbitrary FPI, return the public text denoted by that FPI.

Of course, given an FPI _and a TR9401 catalog_ this is trivial;
or given an FPI and the URL of the resource from which the FPI was
referenced (so that a catalog can be retrieved a la Panorama) it's
also doable.  But the more general problem of mapping FPIs to public
text, given no information other than the FPI itself, is at
present insoluble.

> [...]
> I am not in the business of defending TR9401 against all
> comers, but I do think Joe's statements are poor reflections of reality.

Yes, I did slightly overstate the case :-)

The point I wanted to make was that PUBLIC ID's are useful
even if they *can't* be automatically resolved.

The best example of this is in document type declarations.
When an XML user agent sees

    <!DOCTYPE MYDOC PUBLIC "-//J. English/DTD Joe's Document Type//EN"
				"http://www.art.com/~joe/mydtd.dtd">

it gets three pieces of information.  First, the element type name
of the document element; second, the PUBLIC identifier
of the DTD to which the document conforms; and third, a SYSTEM
ID by which the formal part of the DTD can be retrieved.

The first is redundant since there's going to be a <MYDOC> tag
coming up next anyway as XML forbids tag omission; the third
is in many cases unnecessary since the formal part of the DTD isn't
required to parse the rest of the document.  The second is the
most, if not the only, important piece of information that
an XML UA can get out of the whole <!DOCTYPE ...> declaration.
It tells the program that the document is of the same type as
one beginning with

    <!DOCTYPE MYDOC PUBLIC "-//J. English/DTD Joe's Document Type//EN"
				"http://www.crl.com/~jenglish/mydtd.dtd" >
				<!--* (note different SYSTEM id) *-->

and that it's of a different type than one beginning with

    <!DOCTYPE MYDOC PUBLIC "-//F. Furter/DTD Frank's Document Type//EN"
			    "http://www.transylvania.com/~ffurter/mydtd.dtd">


Even this is not terribly important for Joe's and Frank's
private DTDs, but for shared, community-specific document types
like CML -- which is precisely the sort of application
for which XML holds the most promise -- it's critical.

A PUBLIC identifier in the <!DOCTYPE ...> declaration
is the most reliable way for a document to communicate
to (say) a CML-aware user agent that it is a CML document,
and not some other XML document type that happens to also
be called CML.  It doesn't matter in this case if the UA
can't resolve the FPI; it doesn't need the _formal_ part of
the DTD anyway.  What matters is the _informal_ part, and a
PUBLIC identifier is the only way to identify that.


--Joe English

  jenglish@crl.com
Received on Thursday, 20 February 1997 18:47:46 UTC