Re: revised restatement of the RE rules

Michael Sperberg-McQueen <U35395@UICVM.CC.UIC.EDU> wrote:

> Here is my restatement of the RE rules

Thank you for providing that excellent summary.
Please consider posting it to comp.text.sgml too;
it deserves a wider audience.

And... now that you've put it that way... the RS/RE rules
don't look all that hard to implement after all...

> Clause 7.6.1 a says "the first RE in an element is ignored if no RS,
> data, or proper subelement preceded it."

Right before that it says "An RE remaining *after replacement
of all references and recognition of markup* is treated as data
unless...", &c, so in the production:

>   nondata ::= comment declaration
>              | shortref use declaration
>              | link set use declaration
>              | processing instruction
>              | character reference
>              | entity reference
>              | marked section declaration
>              | included subelement
>              | short reference
>              | entity-end

'character reference', 'entity reference', and 'short reference'
are never seen by this "phase" of the parser.   (A quick test 
with SGMLS confirms that record-ends after references that
expand into data are not discarded).


> RS is significant only if it's markup -- since it can be markup only in
> a shortref, it's of no interest to XML.  For our purposes, RS is always
> ignored, period.

I believe that RS can also slip through when it appears
inside a processing instruction (productions 44, 45,
47, 48, 50, and 51...).  

(I only know this because it crashed Cost once...)

--Joe English