Re: revised restatement of the RE rules
Michael Sperberg-McQueen <U35395@UICVM.CC.UIC.EDU> wrote:
> Here is my restatement of the RE rules
Thank you for providing that excellent summary.
Please consider posting it to comp.text.sgml too;
it deserves a wider audience.
And... now that you've put it that way... the RS/RE rules
don't look all that hard to implement after all...
> Clause 7.6.1 a says "the first RE in an element is ignored if no RS,
> data, or proper subelement preceded it."
Right before that it says "An RE remaining *after replacement
of all references and recognition of markup* is treated as data
unless...", &c, so in the production:
> nondata ::= comment declaration
> | shortref use declaration
> | link set use declaration
> | processing instruction
> | character reference
> | entity reference
> | marked section declaration
> | included subelement
> | short reference
> | entity-end
'character reference', 'entity reference', and 'short reference'
are never seen by this "phase" of the parser. (A quick test
with SGMLS confirms that record-ends after references that
expand into data are not discarded).
> RS is significant only if it's markup -- since it can be markup only in
> a shortref, it's of no interest to XML. For our purposes, RS is always
> ignored, period.
I believe that RS can also slip through when it appears
inside a processing instruction (productions 44, 45,
47, 48, 50, and 51...).
(I only know this because it crashed Cost once...)