Re: RS/RE: basic questions
(The discussion so far has focussed on RS/RE handling, but there
is also a problem with separator characters _other_ than
record-ends in mixed vs. element content.)
Paul Prescod <firstname.lastname@example.org> wrote:
> Joe seems to be proposing that if we
> * restrict PIs and comments to element content
> * restrict mixed-content-models to "|"
[ actually, "restrict mixed content to OR groups
with a REP occurrence indicator and which only contain
primitive content tokens", or something equivalent;
IOW, "no pernicious mixed content" ]
> * disallow inclusion exceptions
There's one more rule (which is the important one):
* disallow separator characters in element content
This is because things like:
have different meanings depending on whether A has mixed content or
element content. The record-end after the first </b> end-tag is
significant in the former case, and is ignored in the latter.
If A has element content, the above would have to be written like:
> then we can reduce the RS/RE handling rules to "Robert's Rules" ( =) ) of
> In data content:
> 1. If an element begins or ends with a newline [not entirely
> accurate, but this is what people see], the newline is ignored.
> 2. Newlines inside markup are ignored.
> 3. All other newlines are passed on.
Yes, as far as I can tell.
Charles' proposal is similar:
* restrict PIs and comment declarations to element content
* disallow mixed content
* disallow inclusion exceptions (or perhaps, disallow
included subelements in pseudoelement content).
* require data content to be delimited
The chief difference is in the second and fourth rules.
With these restrictions the RS/RE/separator character rules
are even simpler:
1. Delimited separator characters are data.
2. Undelimited separator characters are ignored.