Re: draft proposal for catalog resolution

> From: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>
> 3 Resolving PUBLIC Identifiers (New section)
> An XML processor can resolve a public identifier to a system identifier
> by looking up the public identifier in a supplemental catalog, which has
> the following structure:
> XMLCatalog ::= S? ( ( catComment | pubEntry | otherEntry )
>                 ( S ( catComment | pubEntry | otherEntry) )* )?
> catComment ::= '--*' (Char* - (Char* '*--' Char*) '*--'
> pubEntry   ::= 'PUBLIC' S PublicID S SystemLiteral
> otherEntry ::= catKeyword (S SystemLiteral)+
> catKeyword ::= (Char* - (S | SystemLiteral | 'PUBLIC'
>                | PublicID | catComment))

I am not capable of determining (at least, not in the time I can
allot to it) whether the above catKeyword production is correct.

What is necessary (and what my writeup attempts) is that the catalog
parser can reliably find the beginning of the next catalog entry
regardless what combination of the "otherEntries" this processor
recognizes and doesn't recognize and regardless of the number (possibly
variable) of arguments any unrecognized keyword may take.

> A catPublic entry maps a public identifier into a system identifier,
> which may be used to locate the entity itself. For example:
> PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN" "iso-lat1.gml"
> PUBLIC "-//ACME//DTD Report//EN" "http://www.acme.com/dtds/report.dtd"
> The catalog format is that defined by SGML Open Technical Resolution
> 9401:1995 (Amendment 1 to TR 9401), which defines several keywords in
> addition to PUBLIC. These are matched by the otherEntry rule, and may be
> ignored (or acted on) by XML processors.

Speaking of logical names versus specific locations, I would prefer
that the reference above is to TR9401 (no date/amendment level/version).
I hope to have TR9401:1997 out for vote by SGML Europe.  It would add

> If the public identifier in a catalog entry matches that given in an
> ExternalID, then the system identifier in the catalog entry is
> associated with the public identifier in question and may be used to
> retrieve it. Before matching takes place, both public identifiers must
> be normalized: leading and trailing white space is stripped, and
> embedded white space is replaced by single space (#x0020) characters.
> (Except that no entity references are recognized, this is the same
> normalization as is performed for attribute values of type CDATA.) The

I don't think this bit is necessary.  Neither the & nor % character is
allowed in public identifiers, so why talk of entity references?

> catalog lookup may involve more than one catalog file; it ends when the
> first matching entry is found.

TR9401 very carefully talks of one or more "catalog entry files" making
up a logical catalog.  It avoids the phrase "catalog file" since this
seems ambiguous.  I say this for the sake of terminology compatibility with
TR9401--I don't plan to enter into a terminology debate, and I'm willing to
go with any wording for XML so long as it is unambiguous.

> At user option, the XML processor must look first for a catalog file on
> the local system; the location of this catalog file, and the method of
> identifying it, are outside the scope of this specification. If no
> matching entry is found in the local catalog, the XML processor must
> look next in the default catalog.  Unless otherwise provided by
> information outside the scope of this specification (e.g. a special XML
> element defined by a particular DTD, or a processing instruction defined
> by a particular application specification), the default catalog is that
> found using the relative URL catalog . If no matching entry is found in
> either the local catalog (if any) or in the default catalog (if any),
> then the XML processor may treat the catalog lookup process as having
> failed.

First, to comment on the above, I have to define my terms:  in my vocabulary,
there is only one (logical) catalog effectively composed of an ordered
"list" of catalog entry files.  Whereas you can use different terminology
if this is the only way to get catalogs into XML (though I fear confusion
if/when people read/know of TR9401), I cannot make intelligent comments
without using the terminology carefully.

Given my definition, there is no such thing a default/local catalog;
there is only the concept of allowing the user (either at the
individual level or at the system-administrator-configurable level) to
specify to the processor an ordered list catalog entry files.  Said
list might well be "first this local file, then that default file, then
whatever else might have been specified via some PI in the document."

I'm concerned with your two-layer local/default approach.  I think it
is too prescriptive.  In particular, what I want is to look first in
the document-specific catalog entry file (e.g., the "catalog" URL, relative
to the document instance) before looking in the default catalog entry file
on my local system, then next look perhaps at a default catalog entry file
on the system on which I found the document.

Since you are not being explicit how one might specify any of the catalog
entry files, why be explicit about their number, order, or location?

By way of trying to make a specific suggestion, here is my quick attempt:

  At user option, the XML processor must attempt to locate and process
  one or more catalog entry files in the order specified by the user.
  These files may reside on the local system, in a location relative to
  the entity in which the public identifier was used, or elsewhere.  The
  the method of identifying the ordered list of catalog entry files is
  outside the scope of this specification.  If no list of catalog entry
  files has been given to the XML processor, the default list shall
  consist of the single catalog entry file specified by the relative URL
  "catalog".  If no matching entry is found using the list of catalog
  entry files as determined above, then the XML processor may treat the
  catalog lookup process as having failed.

> If catalog lookup on a public identifier fails, [possibly add: or an
> attempt to retrieve the entity using the result of the catalog lookup
> fails, -Ed.] and a system identifier was supplied in the externalID,
> then an XML processor must behave as if the system identifier was the
> only identifier supplied.