draft proposal for catalog resolution

A proposal for specific wording to describe public identifier and
their resolution, based on Paul Grosso's posting of the other day but
modified by myself (Paul bears no blame for this) is at

  http://www.uic.edu/~cmsmcq/tech/xml/pisocat.html

and may be read there.  I think the catalog resolution mechanism
described there is (a) simple to implement and use, (b) sufficiently
flexible for the real world, and (c) compatible with full SOCats (check
me on this, Paul!).  I also think it's ready for inclusion in the
XML-Lang spec, but I realize not everyone will see it that way.

So now there are two proposals on the table, substantially identical
(only difference is in the method of finding the catalog file)
though different in style.

Comments, please.

-C. M. Sperberg-McQueen

p.s. in case there is anyone who would rather read this in email instead
of on the Web, I append an ASCII version of the document, prettied up
from what Charlotte produces.
-------

Web page: HTTP://www.uic.edu/~cmsmcq/tech/xml/pisocat.html

                             Changes to XML
                     To Support Public Identifiers
                     And PI Resultion Using Socats

C. M. Sperberg-McQueen
28 March 1997, rev. 30 March 1997
_________________________________________________________________________

Table of Contents

       * 1 Public Identifier Syntax
       * 2 External Entities (Proposed Revision)
       * 3 Resolving PUBLIC Identifiers (New section)
_________________________________________________________________________

This document describes some possible changes to the draft XML
specification.  The first part describes the changes necessary to make
the XML external identifier rules support public identifiers
syntactically. The second and third parts are sample draft language for
a revision of the section on external entities and for a new section on
resolving public identifiers.

This draft language represents solely the opinion of the author and has
not been approved by anyone, let alone the W3C SGML ERB.

1 Public Identifier Syntax

To allow public identifiers, the productions in the section on external
identifiers would need to be changed as follows:

 ExternalID    ::= SystemID | PublicID
 SystemID      ::= 'SYSTEM' S SystemLiteral
 SystemLiteral ::= '"' (URLchar)* '"'
               |   "'" (URLchar - "'")* "'"
 URLchar       ::= (the set of characters legal in URLs)
 PublicID      ::= 'PUBLIC' S PublicLiteral (S SystemLiteral)?
 PublicLiteral ::= '"' PubLitChar* '"' | "'" PubLitChar* "'"
 PubLitChar    ::= (#x000D | #x000A | [A-Z] | [a-z] | [0-9]
                   | ' ' | "'" | '(' | ')' | '+' | ','
                   | '-' | '.' | '/' | ':' | '=' | '?'



2 External Entities (Proposed Revision)

If the entity is not internal, it is an external entity, declared as
follows:

ExternalDef   ::= External ID NDataDecl?                /* unchanged */
ExternalID    ::= 'PUBLIC' S PubLiteral ( S SystemLiteral )? |
              |   'SYSTEM' S SystemLiteral
SystemLiteral ::= '"' (URLchar)* '"'
              |   "'" (URLchar - "'")* "'"
URLchar       ::= (the set of characters legal in URLs)
URLchar       ::= [a-zA-Z0-9]
              |   ';' | '/' | '?' | ':' | '@' | '&' | '=' | '+'
                                                         /* reserved */
              |   '$' | '-' | '_' | '.' | '!' | '~' | '*'
              |   "'" | '(' | ')' | ','                      /* mark */
              |   '%'                         /* for hex escapes %NN */

/* N.B. control characters, space, and the characters
< > # " { } | \ ^ [ ] ` are not allowed in URLs,
according to the relevant RFCs */

PubLiteral    ::= '"' PubChar* '"' | "'" PubChar* "'"
PubChar       ::= Letter | Digit | S | ['()+,-./:=?]

(or change grammar as given above -Ed.)

The PubLiteral and SystemLiteral strings which may occur within the
ExternalID are called the entity's public identifier and its system
identifier, respectively. The system identifier is a URL, which may be
used to retrieve the content of the entity. the public identifier is a
location-independent name for the entity, which can be resolved to a URL
using the mechanism described below .

Unless otherwise provided by information outside the scope of this
specification (e.g. a special XML element defined by a particular DTD,
or a processing instruction defined by a particular application
specification), relative URLs are relative to the location of the entity
or file within which the entity declaration occurs. Relative URLs in
entity declarations within the internal DTD subset are thus relative to
the location of the document; those in entity declarations in the
external subset are relative to the location of the files containing the
external subset.



3 Resolving PUBLIC Identifiers (New section)

(This section is intended to be inserted immediately following the
section on External Entities. -Ed.)

XML processors may implement any mechanism they choose for resolving
public identifiers (i.e. finding a URL for some physical copy of the
object named). At user option, however, they must also support the
mechanism described in this section.

An XML processor can resolve a public identifier to a system identifier
by looking up the public identifier in a supplemental catalog, which has
the following structure:

XMLCatalog ::= S? ( ( catComment | pubEntry | otherEntry )
                ( S ( catComment | pubEntry | otherEntry) )* )?
catComment ::= '--*' (Char* - (Char* '*--' Char*) '*--'
pubEntry   ::= 'PUBLIC' S PublicID S SystemLiteral
otherEntry ::= catKeyword (S SystemLiteral)+
catKeyword ::= (Char* - (S | SystemLiteral | 'PUBLIC'
               | PublicID | catComment))

A catPublic entry maps a public identifier into a system identifier,
which may be used to locate the entity itself. For example:

PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN" "iso-lat1.gml"
PUBLIC "-//ACME//DTD Report//EN" "http://www.acme.com/dtds/report.dtd"

The catalog format is that defined by SGML Open Technical Resolution
9401:1995 (Amendment 1 to TR 9401), which defines several keywords in
addition to PUBLIC. These are matched by the otherEntry rule, and may be
ignored (or acted on) by XML processors.

If the public identifier in a catalog entry matches that given in an
ExternalID, then the system identifier in the catalog entry is
associated with the public identifier in question and may be used to
retrieve it. Before matching takes place, both public identifiers must
be normalized: leading and trailing white space is stripped, and
embedded white space is replaced by single space (#x0020) characters.
(Except that no entity references are recognized, this is the same
normalization as is performed for attribute values of type CDATA.) The
catalog lookup may involve more than one catalog file; it ends when the
first matching entry is found.

At user option, the XML processor must look first for a catalog file on
the local system; the location of this catalog file, and the method of
identifying it, are outside the scope of this specification. If no
matching entry is found in the local catalog, the XML processor must
look next in the default catalog.  Unless otherwise provided by
information outside the scope of this specification (e.g. a special XML
element defined by a particular DTD, or a processing instruction defined
by a particular application specification), the default catalog is that
found using the relative URL catalog . If no matching entry is found in
either the local catalog (if any) or in the default catalog (if any),
then the XML processor may treat the catalog lookup process as having
failed.

If catalog lookup on a public identifier fails, [possibly add: or an
attempt to retrieve the entity using the result of the catalog lookup
fails, -Ed.] and a system identifier was supplied in the externalID,
then an XML processor must behave as if the system identifier was the
only identifier supplied.

Received on Sunday, 30 March 1997 18:54:58 UTC