XML as Conforming SGML (aka Core SGML) Charles F. Goldfarb This document is a specification of a DTD subset notation that allows, to facilitate comparison, exactly the XML constraints proposed by Tim Bray in MSGML. (This author has somewhat different preferences for XML constraints.) In addition, it includes formal definitions for the distinct kinds of data (such as public identifiers) that MSGML just labels "CDATA". The notation used is not intended to be read by parser-generator tools. It is designed to show how little of SGML is needed for a useful core and how concisely it can be presented and explained. Time did not permit the creation of an equivalent grammar in a form suitable for parser-generator input, but it is obviously possible to create one. The 8879 metalanguage is used to allow easy comparison with full SGML, with the following enhancements: 1. Delimiters are shown as literals, since XML has a fixed concrete >syntax. 2. The sequence commas are omitted. 3. Spaces between tokens (rather than "s") indicate when whitespace is allowed. When tokens are concatenated, no whitespace is allowed. Note that "--" is not permitted in any declaration. elemtype-dcl = "" choice = "(" elem-cont ("|" elem-cont)* ")"("?"|"*"|"+") seq = "(" elem-cont ("," elem-cont)* ")"("?"|"*"|"+") mixed = "(" mixed-cont ("|" mixed-cont)* ")*" elem-cont = (Gi("?"|"*"|"+") | choice | seq) mixed-cont = (Gi("?"|"*"|"+") | "#PCDATA" | mixed) attlist-dcl = "" attdef = Attname dec-value default-spec dec-value = ("CDATA" | tokenized | enum-list | notname-list) tokenized = ("ID"|"IDREF"|"IDREFS"|"ENTITY"|"ENTITIES"|"NAME" > |"NAMES" >|"NMTOKEN"|"NMTOKENS"|"NUMBER"|"NUMBERS"|"NUTOKEN"|"NUTOKENS") enum-list = "(" NMTOKEN ("|" NMTOKEN)* ")" notname-list = "NOTATION" "(" Notname ("|" Notname)* ")" default-spec = ("#REQUIRED" | "#IMPLIED" | ("#FIXED"? a-literal)) entity-dcl = "" external-spec = external-id ("NDATA" Notname)? p-entity-dcl = "" notation-dcl = "" external-id = ("PUBLIC" m-literal) | ("SYSTEM" system-id) | ("PUBLIC" m-literal system-id) ">" system-id = (('"' fsi "') | ("'" fsi "'") | p-literal) fsi = "<" smname ("base" "=" literal)? >">"(#CHAR|entref)*"" smname = ("url" | "osfile") p-literal = (('"'(#CHAR|p-entref)*'"') | ("'"(#CHAR|p-entref)*"'")) p-entref = "%"P-entname";" a-literal = (('"'(#CHAR|g-entref)*'"') | ("'"(#CHAR|g-entref)*"'")) g-entref = "&"Entname";" m-literal = (('"'#MINCHAR*'"') | ("'"#MINCHAR*"'")) WHERE: 1. Syntactic variables in initial caps are NAMES in the XML concrete syntax; that is: Gi Attname Entname Notname. 2. Syntactic variables in all caps are character strings whose spelling is defined by the XML concrete syntax; that is: NMTOKEN. 3. #CHAR is a character in the XML character set. 4. #MINCHAR is a minimum data character in the XML character set.