Re: Errors or am I misreading the standard?

>Date: Mon, 26 Apr 1999 21:09:14 -0400
>From: dvint@cstone.net (Dan Vint)

>I'm curious if many other people have had troubles with these areas. Most
>of the other rules all seem to flow nicely except for: 
>78 extParsedEnt
>79 extPE
>30 extSubset
>33 LanguageID
>6   Names
>8    Nmtokens
>
>these are all orphaned rules that only connect up through the text of the
>document.

I believe there have been queries about the external DTD subset,
external parameter entities, and the language ID; I don't believe
there have been queries about the Names and Nmtokens productions.  I'm
not sure whether readers have reported confusion about external parsed
entities or not.

>I looked at Tim Bray's annotated spec, Bob Ducharm's book version and a
>couple of other books and they all just sort of blindly go with what is
>presented and there is no analysis or explanation as to how you get to
>these rules and how PEs can be used in element content models. 

That's a good point, to be noted for any thorough revision in the future.

>It seems like there is room for lots of interpretation or incomplete
>reading that will cause some interesting debate at least. Isn't the
>approach taken here not quite a EBNF definition of the language. Its been
>some time since I've read about it in school and all, but it did seem like
>there was a requirement for a single starting point from which you should
>be able to get to everything. 

I think there are two basic problems.  One is that productions like
Names, Nmtokens, and extParsedEnt are present in order to be available
for reference from validity constraints or well-formedness
constraints, rather than being part of the syntax for the language.
This is a conscious design decision, motivated by experience with the
grammar in 8879.  The 8879 grammar does integrate the productions for
Names, etc., into the grammar for attribute values -- but since these
types are not lexically distinct from CDATA, this renders the grammar
hopelessly ambiguous.  The rule that an attribute declared as having
type NMTOKENS should match that production is a context-sensitive rule
in the definition of the language.  For this reason, it does not
bother me much that Names, Nmtokens, etc. are orphaned productions.

The second problem is one that Dan Connolly raised during the
preparation of the spec, namely that the spec does not define clearly
what it is defining, and provides no formal model which would allow it
to talk clearly about both the sequences of characters allowable in
XML entities (document entity, external DTD subset, external parameter
entities, external parsed entities, etc.) and the sequence of
characters allowable in the logical document stream after expansion of
entity and character references.  The working group agreed in
principle that a clearer formal model was a good idea, but did not
take active steps to create or document a formal model.  There was
other pressing business, and the extra formality seemed likely to make
the spec less accessible to many readers, rather than more, so this is
I think an understandable chain of events, though from time to time I
think of it as unfortunate.  Getting a good clean formal model is made
particularly necessary, but also rather hard, by some of the
interactions between entity expansion and grammatical structure which
XML inherited from SGML.  One acid test for a formal model would have
to be: does it help make the rules for entity expansion easier to
express and understand?  So far, I have not seen any attempt at a
formal model that even provided vocabulary for talking about entity
expansion, let alone making the rules clearer.

>From my point of view, knowing how the standards come about (and having
>worked with some of the SGML details) it is hard to tell if some of these
>things were meant to be this way or it the "tieing together" or linking
>features were deleted, but not all of the details, or if they were
>incomplete thoughts/concepts.

I don't think anyone on the WG would have objected in principle to
making the spec 'clearer' -- but any attempt at clarification does
tend to have flaws that risk getting it shot down.  But I think it's
fair to say that the WG did have a fairly clear understanding of what
was supposed to be legal XML, and that the rules governing the
external subset and so on were explicitly discussed.  Finding a clear
expression for those rules was a hard task, at which we sometimes
succeeded and sometimes fell short.

>I would like to give you the opportunity to maybe get in print an
>explanation or what was thought to link these features up as I feel
>compelled to try and link them into an understandable model. Now that what
>I thought was a smooth flow through the EBNF has some problems, I want to
>point to the pieces of text that allow these rules to be used. 

I think that's probably a good idea.

>If there is an official or even discussed logic that you would like to
>throw my way, I'll use it - otherwise I will be taking a stab at trying to
>say "parameter entities are allowed in element content model because ...."
>I would say 95% of this stuff flows in a logical manner and I guess this is
>why that large explanation about entities was introduced to manage these
>questions, but after getting lulled into nice clear definitions for
>everything else, these couple just seemed very odd.

I'm not quite sure what level of logic you're looking for.  At one
level, parameter entities are allowed in element content models
because

(a) they are an important tool in making DTDs maintainable;
(b) they are an essential tool in providing DTDs with a local-modification
layer.

At another level, the are allowed

(c) because the spec says they are, in the prose just before the
production for 'doctypedecl'.

I hope this helps; good luck with your book.

-C. M. Sperberg-McQueen
 Senior Research Programmer, University of Illinois at Chicago

Received on Tuesday, 27 April 1999 10:54:01 UTC