Re: RS/RE: basic questions from Joe English on 1996-09-23 (w3c-sgml-wg@w3.org from September 1996)

From: Joe English <jenglish@crl.com>
Date: Mon, 23 Sep 1996 10:52:55 -0700
To: w3c-sgml-wg@w3.org
Message-Id: <199609231752.AA02804@mail.crl.com>

(The discussion so far has focussed on RS/RE handling, but there
is also a problem with separator characters _other_ than
record-ends in mixed vs. element content.)

Paul Prescod <papresco@calum.csclub.uwaterloo.ca> wrote:

> Joe seems to be proposing that if we
>
> * restrict PIs and comments to element content
> * restrict mixed-content-models to "|"

  [ actually, "restrict mixed content to OR groups
    with a REP occurrence indicator and which only contain
    primitive content tokens", or something equivalent;
    IOW, "no pernicious mixed content" ]

> * disallow inclusion exceptions

There's one more rule (which is the important one):

  * disallow separator characters in element content

This is because things like:

	<a>
	<b>blah</b>
	<b>blah</b>
	</a>

have different meanings depending on whether A has mixed content or
element content.  The record-end after the first </b> end-tag is
significant in the former case, and is ignored in the latter.

If A has element content, the above would have to be written like:

	<a><b>blah</b><b>blah</b></a>

or

	<a
	><b>blah</b
	><b>blah</b
	></a>

instead.

> then we can reduce the RS/RE handling rules to "Robert's Rules" ( =) ) of
>
> In data content:
>  1. If an element begins or ends with a newline [not entirely
>     accurate, but this is what people see], the newline is ignored.
>  2. Newlines inside markup are ignored.
>  3. All other newlines are passed on.

Yes, as far as I can tell.

Charles' proposal is similar:

  * restrict PIs and comment declarations to element content
  * disallow mixed content
  * disallow inclusion exceptions (or perhaps, disallow
    included subelements in pseudoelement content).
  * require data content to be delimited

The chief difference is in the second and fourth rules.

With these restrictions the RS/RE/separator character rules
are even simpler:

  1. Delimited separator characters are data.
  2. Undelimited separator characters are ignored.

--Joe English

  jenglish@crl.com

Received on Monday, 23 September 1996 13:52:39 UTC