W3C home > Mailing lists > Public > xml-editor@w3.org > April to June 1998

XML 1.0 - clarification - deciphering [51] Mixed

From: Kent M Pitman <kmp@harlequin.com>
Date: Mon, 4 May 98 23:03:17 EDT
Message-Id: <9805050303.AA00538@excel.harlequin.com>
To: xml-editor@w3.org
Cc: kmp@harlequin.com
Right now you have:

 [51] Mixed ::= '(' S? '#PCDATA' ( S? '|' S? Name )* S? ')*' |
                '(' S? '#PCDATA'                     S? ')'

As nearly as I can tell, the only point of separating Mixed's definition
into two parts is to control the '*' (making it required when a Name is
given) but this is an awfully wierd way to say that.  The above makes the
final '*' be 'either required or not' (i.e., optional).  It seems to me it'd
be (approximately) 10,000% clearer to say:

 [51] Mixed ::= '(' S? '#PCDATA' ( S? '|' S? Name )+ S? ')*'      |
                '(' S? '#PCDATA'                     S? ')' '*'?

saying that when the #PCDATA has no following names, the "* is optional".

The second formulation also has the important difference that it does not
require lookahead in order to successfully parse the first part (the part
in parens) knowing deterministically which branch you went through.
(Something I this is a tremendously important part of the grammar even
though you've already acknowledged you don't.)

- - - - 

BTW, I don't understand why "*" is permitted at all in the case 
of just
since if there are no other elements permitted.  The point of a *
is so that in
you can do 
allowing repeated pcdatas or foos.  But with
the entire thing is just one big
and there can't be two blocks of parsed character data as in
since there would be no uniquely identified point at which to make 
the division (without making the parse nondeterministic).

If you were going to disallow the '*' in the #PCDATA-only case,
you would DEFINITELY want to use my reformulation using "+" rather
than "*" for the set containing the Names.
Received on Monday, 4 May 1998 22:59:55 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 23:12:34 UTC