Public IDs for notations in XML


As the ERB grapples with the issue of public identifiers in XML, we
realized that regardless of the final decision, there probably needs to be
a distinction between the rules for entities and the rules for data content
notations.  In particular, it doesn't really make any sense for the system
identifiers of notations to be URLs.  It seems to make much more sense to
use MIME types.  This would make notation declarations mappings from local
notation names and  (potentially) public identifiers to MIME types, which
would be useful both for XML and traditional SGML processors.  Thus we are
thinking of a solution along the lines outlined in the following straw
proposal.  In this proposal I'm assuming that public identifiers are
allowed (at least for notations), although I realize they might not be or
might be URNs or something.


Notation declarations consist of a notation name followed by an external
identifier.  The external identifier consists of either the keyword
"SYSTEM" followed by a literal containing a MIME type specification or the
keyword "PUBLIC" followed by a literal containing a public identifier
followed by a literal containing a MIME type specification.

The MIME type specification could either be just a MIME type name, i.e.,
"image/jpeg" (or mime type followed by parameters) or it could,
potentially, be an SGML formal system identifier with a storage manager of
MIMETYPE: "<mimetype>image/jpeg".

Thus, a notation declaration could either be:

<!NOTATION Graphic PUBLIC "-//Joint Photographic Experts Group//NOTATION
JPEG Image Format//EN" "image/jpeg" >


<!NOTATION Graphic SYSTEM "image/jpeg" >

If XML does not specify a public identifier resolution mechanism then it
would not be allowed to have a public identifier without a corresponding
MIME type.  If there is such a mechanism, then it would be possible,
potentially, to do the mapping to MIME types there.

One reason for allowing the public identifier for notations without a
resolution mechanism is to make the correlation between existing notations
used in SGML documents and the MIME types used for Internet delivery
clearer, especially to those who already have SGML documents.  Obviously,
if you are coming from HTML and only ever thought about MIME types, you
don't necessarily need or want a public ID and don't need to specify one.  

In addition, many SGML processes (including much of architecture
processing) depend on notations and public identifiers for notations, so it
will make taking advantage of these facilities from XML much easier if we
allow public identifiers for notations even if we don't for entities.  


W. Eliot Kimber (eliot@isogen.com) 
Senior SGML Consulting Engineer, Highland Consulting
2200 North Lamar Street, Suite 230, Dallas, Texas 75202
+1-214-953-0004 +1-214-953-3152 fax
http://www.isogen.com (work) http://www.drmacro.com (home)
"Rats in the morning, rats in the afternoon...if they don't go away, I'll be
re-educated soon..."                 --Austin Lounge Lizards, "1984 Blues"