- From: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>
- Date: Sun, 30 Mar 97 17:46:59 CST
- To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
A proposal for specific wording to describe public identifier and their resolution, based on Paul Grosso's posting of the other day but modified by myself (Paul bears no blame for this) is at http://www.uic.edu/~cmsmcq/tech/xml/pisocat.html and may be read there. I think the catalog resolution mechanism described there is (a) simple to implement and use, (b) sufficiently flexible for the real world, and (c) compatible with full SOCats (check me on this, Paul!). I also think it's ready for inclusion in the XML-Lang spec, but I realize not everyone will see it that way. So now there are two proposals on the table, substantially identical (only difference is in the method of finding the catalog file) though different in style. Comments, please. -C. M. Sperberg-McQueen p.s. in case there is anyone who would rather read this in email instead of on the Web, I append an ASCII version of the document, prettied up from what Charlotte produces. ------- Web page: HTTP://www.uic.edu/~cmsmcq/tech/xml/pisocat.html Changes to XML To Support Public Identifiers And PI Resultion Using Socats C. M. Sperberg-McQueen 28 March 1997, rev. 30 March 1997 _________________________________________________________________________ Table of Contents * 1 Public Identifier Syntax * 2 External Entities (Proposed Revision) * 3 Resolving PUBLIC Identifiers (New section) _________________________________________________________________________ This document describes some possible changes to the draft XML specification. The first part describes the changes necessary to make the XML external identifier rules support public identifiers syntactically. The second and third parts are sample draft language for a revision of the section on external entities and for a new section on resolving public identifiers. This draft language represents solely the opinion of the author and has not been approved by anyone, let alone the W3C SGML ERB. 1 Public Identifier Syntax To allow public identifiers, the productions in the section on external identifiers would need to be changed as follows: ExternalID ::= SystemID | PublicID SystemID ::= 'SYSTEM' S SystemLiteral SystemLiteral ::= '"' (URLchar)* '"' | "'" (URLchar - "'")* "'" URLchar ::= (the set of characters legal in URLs) PublicID ::= 'PUBLIC' S PublicLiteral (S SystemLiteral)? PublicLiteral ::= '"' PubLitChar* '"' | "'" PubLitChar* "'" PubLitChar ::= (#x000D | #x000A | [A-Z] | [a-z] | [0-9] | ' ' | "'" | '(' | ')' | '+' | ',' | '-' | '.' | '/' | ':' | '=' | '?' 2 External Entities (Proposed Revision) If the entity is not internal, it is an external entity, declared as follows: ExternalDef ::= External ID NDataDecl? /* unchanged */ ExternalID ::= 'PUBLIC' S PubLiteral ( S SystemLiteral )? | | 'SYSTEM' S SystemLiteral SystemLiteral ::= '"' (URLchar)* '"' | "'" (URLchar - "'")* "'" URLchar ::= (the set of characters legal in URLs) URLchar ::= [a-zA-Z0-9] | ';' | '/' | '?' | ':' | '@' | '&' | '=' | '+' /* reserved */ | '$' | '-' | '_' | '.' | '!' | '~' | '*' | "'" | '(' | ')' | ',' /* mark */ | '%' /* for hex escapes %NN */ /* N.B. control characters, space, and the characters < > # " { } | \ ^ [ ] ` are not allowed in URLs, according to the relevant RFCs */ PubLiteral ::= '"' PubChar* '"' | "'" PubChar* "'" PubChar ::= Letter | Digit | S | ['()+,-./:=?] (or change grammar as given above -Ed.) The PubLiteral and SystemLiteral strings which may occur within the ExternalID are called the entity's public identifier and its system identifier, respectively. The system identifier is a URL, which may be used to retrieve the content of the entity. the public identifier is a location-independent name for the entity, which can be resolved to a URL using the mechanism described below . Unless otherwise provided by information outside the scope of this specification (e.g. a special XML element defined by a particular DTD, or a processing instruction defined by a particular application specification), relative URLs are relative to the location of the entity or file within which the entity declaration occurs. Relative URLs in entity declarations within the internal DTD subset are thus relative to the location of the document; those in entity declarations in the external subset are relative to the location of the files containing the external subset. 3 Resolving PUBLIC Identifiers (New section) (This section is intended to be inserted immediately following the section on External Entities. -Ed.) XML processors may implement any mechanism they choose for resolving public identifiers (i.e. finding a URL for some physical copy of the object named). At user option, however, they must also support the mechanism described in this section. An XML processor can resolve a public identifier to a system identifier by looking up the public identifier in a supplemental catalog, which has the following structure: XMLCatalog ::= S? ( ( catComment | pubEntry | otherEntry ) ( S ( catComment | pubEntry | otherEntry) )* )? catComment ::= '--*' (Char* - (Char* '*--' Char*) '*--' pubEntry ::= 'PUBLIC' S PublicID S SystemLiteral otherEntry ::= catKeyword (S SystemLiteral)+ catKeyword ::= (Char* - (S | SystemLiteral | 'PUBLIC' | PublicID | catComment)) A catPublic entry maps a public identifier into a system identifier, which may be used to locate the entity itself. For example: PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN" "iso-lat1.gml" PUBLIC "-//ACME//DTD Report//EN" "http://www.acme.com/dtds/report.dtd" The catalog format is that defined by SGML Open Technical Resolution 9401:1995 (Amendment 1 to TR 9401), which defines several keywords in addition to PUBLIC. These are matched by the otherEntry rule, and may be ignored (or acted on) by XML processors. If the public identifier in a catalog entry matches that given in an ExternalID, then the system identifier in the catalog entry is associated with the public identifier in question and may be used to retrieve it. Before matching takes place, both public identifiers must be normalized: leading and trailing white space is stripped, and embedded white space is replaced by single space (#x0020) characters. (Except that no entity references are recognized, this is the same normalization as is performed for attribute values of type CDATA.) The catalog lookup may involve more than one catalog file; it ends when the first matching entry is found. At user option, the XML processor must look first for a catalog file on the local system; the location of this catalog file, and the method of identifying it, are outside the scope of this specification. If no matching entry is found in the local catalog, the XML processor must look next in the default catalog. Unless otherwise provided by information outside the scope of this specification (e.g. a special XML element defined by a particular DTD, or a processing instruction defined by a particular application specification), the default catalog is that found using the relative URL catalog . If no matching entry is found in either the local catalog (if any) or in the default catalog (if any), then the XML processor may treat the catalog lookup process as having failed. If catalog lookup on a public identifier fails, [possibly add: or an attempt to retrieve the entity using the result of the catalog lookup fails, -Ed.] and a system identifier was supplied in the externalID, then an XML processor must behave as if the system identifier was the only identifier supplied.
Received on Sunday, 30 March 1997 18:54:58 UTC