Re: The RE rules in 14 lines

Charles@sgmlsource.com (Charles F. Goldfarb) wrote:
>
> For XML and SGML:
>
> An RE in data is insignificant (i.e. not passed to an application,
> which is to say, not part of the grove) when it occurs in any of the
> following patterns:
>
>   start-tag  nondata*  RE
>   RE         nondata*  end-tag
>   RS         nondata+  RE
>
> In applying this rule, a reference is transparent; only its
> replacement is considered.


For SGML, should this rule be applied after OMITTAG inference?
For example, in:

    <!doctype test [
	<!element test - - (x,y,z)>
	<!element (x,y,z) O O (#PCDATA)>
    ]>
    <test>
    <x>content of x</x>
    content of y
    <z>content of z</z>
    </test>

the REs before and after "content of y" are ignored,
even though there are no syntactic start- and end- tags
matching the above patterns.

Also, an RE immediately following the start-tag of an EMPTY
element is not discarded:

    <!doctype test [
	<!element test - - (#PCDATA|e)*>
	<!element e - O EMPTY>
    ]>
    <test>
    asdf
    <e>
    qwerty
    </test>


Perhaps "start of element" and "end of element" would be
more appropriate than "start-tag" and "end-tag"?


--Joe English

  jenglish@crl.com

Received on Thursday, 26 September 1996 17:42:38 UTC