[Prev][Next][Index][Thread]

Re: A28: syntax of markup declarations?



On Thu, 03 Oct 96 18:37:34 CDT, Michael Sperberg-McQueen
<U35395@UICVM.CC.UIC.EDU> wrote:

>A.28 Should XML use the markup-declaration syntax described by ISO 8879
>clauses 10-11, or should XML define a specialized document type and let
>its markup declarations use the document-instance syntax, as proposed
>by MGML?

XML should use a proper subset of the ISO 8879 declaration syntax, for several
reasons:

1. The necessary subset is small, clean, and easily explained. I have attached
the grammar to this note. It has fewer than 30 productions. (SGML has almost
200.)

2. 20,000 or so people already know the DTD language. That is 20,000 more than
know MGML.

3. It is the semantics of markup declarations that presents learning
difficulties, not the syntax. The semantics will be the same in any case.

4. The same is true for implementation. While a second syntax is a burden, it is
a relatively small and easily automated one.

5. SGML instance markup is a great language for representing structured
information. It is a poor language for defining it. Tim's paper.dsd is three
times the size (in lines) of the attached paper.dtd.

6. All SGML tools can handle markup declarations. 

7. There are no SGML interoperability issues because it *is* SGML.

8. There is no problem putting markup declarations in "XML masquerading as
HTML". Declarations just look like long unknown tags. (HTML users may even find
them familiar for that reason.)

9. XML needs to be a conforming subset of SGML; otherwise it will be seen as a
competitor to SGML whatever our good intentions to the contrary. That perceived
competition will confuse users at best; at worst, it will ripen into real
competition, with users and vendors choosing sides.

10. Our objective for XML is to increase the SGML market by making it easier to
understand and implement. We only get this result if XML *is* SGML; otherwise,
SGML doesn't change at all. If XML is a conforming profile of SGML, it can be
the core of SGML97 -- the basic conformance level. The rest of SGML would be
defined as a delta on the core SGML; core XML/SGML users would never have to
read it.
--
Charles F. Goldfarb * Information Management Consulting * +1(408)867-5553
           13075 Paramount Drive * Saratoga CA 95070 * USA
  International Standards Editor * ISO 8879 SGML * ISO/IEC 10744 HyTime
 Prentice-Hall Series Editor * CFG Series on Open Information Management
--
                XML as Conforming SGML (aka Core SGML)

                         Charles F. Goldfarb

This document is a specification of a DTD subset notation that allows,
to facilitate comparison, exactly the XML constraints proposed by Tim
Bray in MSGML. (This author has somewhat different preferences for XML
constraints.) In addition, it includes formal definitions for the
distinct kinds of data (such as public identifiers) that MSGML just
labels "CDATA".

The notation used is not intended to be read by parser-generator
tools. It is designed to show how little of SGML is needed for a
useful core and how concisely it can be presented and explained. Time
did not permit the creation of an equivalent grammar in a form
suitable for parser-generator input, but it is obviously possible to
create one.

The 8879 metalanguage is used to allow easy comparison with full SGML,
with the following enhancements:

1. Delimiters are shown as literals, since XML has a fixed concrete syntax.
2. The sequence commas are omitted.
3. Spaces between tokens (rather than "s") indicate when whitespace is
allowed. When tokens are concatenated, no whitespace is allowed.

Note that "--" is not permitted in any declaration.

<!-- ELEMENT TYPE declaration. The content can be:
  Empty
  Element content: a choice or sequence of element types
  Mixed content:   a choice of element types and data, or any element
                   types and data
-->
elemtype-dcl  = "<!ELEMENT" Gi ("EMPTY"|choice|seq|mixed|"ANY") ">"
choice        = "(" elem-cont ("|" elem-cont)* ")"("?"|"*"|"+")
seq           = "(" elem-cont ("," elem-cont)* ")"("?"|"*"|"+")
mixed         = "(" mixed-cont ("|" mixed-cont)* ")*"
elem-cont     = (Gi("?"|"*"|"+") | choice | seq)
mixed-cont    = (Gi("?"|"*"|"+") | "#PCDATA" | mixed)

<!-- ATTRIBUTE DEFINITION LIST declaration -->
attlist-dcl   = "<!ATTLIST" Gi attdef+ ">"
attdef        = Attname dec-value default-spec
<!--
  An attribute declared value can be CDATA, or a tokenized string, or
  an enumerated value list, or a notation name list.
-->
dec-value     = ("CDATA" | tokenized | enum-list | notname-list)
tokenized     = ("ID"|"IDREF"|"IDREFS"|"ENTITY"|"ENTITIES"|"NAME"                |"NAMES"
                  |"NMTOKEN"|"NMTOKENS"|"NUMBER"|"NUMBERS"|"NUTOKEN"|"NUTOKENS")
enum-list     = "(" NMTOKEN ("|" NMTOKEN)* ")"
notname-list  = "NOTATION" "(" Notname ("|" Notname)* ")"
default-spec  = ("#REQUIRED" | "#IMPLIED" | ("#FIXED"? a-literal))

<!-- ENTITY declaration:
  The replacement text can be internal (in the p-literal, which can be
  a UNICODE number) or external.
-->
entity-dcl    = "<!ENTITY" Entname (p-literal | external-spec)">"
external-spec = external-id ("NDATA" Notname)?
p-entity-dcl  = "<!ENTITY" "%" P-entname (p-literal | external-id)">"

<!-- NOTATION declaration -->
notation-dcl  = "<!NOTATION" Notname external-id ">"

<!-- External identifier parameter -->
external-id   = ("PUBLIC" m-literal) | ("SYSTEM" system-id)
                  | ("PUBLIC" m-literal system-id) ">"
system-id     = (('"' fsi "') | ("'" fsi "'") | p-literal)
fsi           = "<" smname ("base" "=" literal)? ">"(#CHAR|entref)*"</>"
smname        = ("url" | "osfile")

<!-- Literals and entity references -->
p-literal     = (('"'(#CHAR|p-entref)*'"') | ("'"(#CHAR|p-entref)*"'"))
p-entref      = "%"P-entname";"
a-literal     = (('"'(#CHAR|g-entref)*'"') | ("'"(#CHAR|g-entref)*"'"))
g-entref      = "&"Entname";"
m-literal     = (('"'#MINCHAR*'"') | ("'"#MINCHAR*"'"))

WHERE:

1. Syntactic variables in initial caps are NAMES in the XML concrete
   syntax; that is: Gi Attname Entname Notname.
2. Syntactic variables in all caps are character strings whose
   spelling is defined by the XML concrete syntax; that is: NMTOKEN.
3. #CHAR is a character in the XML character set.
4. #MINCHAR is a minimum data character in the XML character set.
<!-- PAPER DTD: Observes XML constraints -->

<!element PAPER (TITLE, AUTHS?, SEC+)>

<!element TITLE (#PCDATA)>
<!element AUTHS (AUTH+)>
<!element AUTH  (#PCDATA)>

<!element SEC   (TITLE?, P+)>

<!element P     (#PCDATA | LIST | REF | qu)*>

<!element LIST  (ITEM+)>
<!attlist LIST
                STYLE (NUM | BULLET) #IMPLIED
>

<!element item  (#PCDATA | REF)*>

<!element REF   (#PCDATA)>

<!-- CFG: Added qu to fix markup error in XML paper -->
<!element qu    (#PCDATA)>


References: