- From: Kent M Pitman <kmp@harlequin.com>
- Date: Mon, 4 May 98 23:03:17 EDT
- To: xml-editor@w3.org
- Cc: kmp@harlequin.com
Right now you have: [51] Mixed ::= '(' S? '#PCDATA' ( S? '|' S? Name )* S? ')*' | '(' S? '#PCDATA' S? ')' As nearly as I can tell, the only point of separating Mixed's definition into two parts is to control the '*' (making it required when a Name is given) but this is an awfully wierd way to say that. The above makes the final '*' be 'either required or not' (i.e., optional). It seems to me it'd be (approximately) 10,000% clearer to say: [51] Mixed ::= '(' S? '#PCDATA' ( S? '|' S? Name )+ S? ')*' | '(' S? '#PCDATA' S? ')' '*'? saying that when the #PCDATA has no following names, the "* is optional". The second formulation also has the important difference that it does not require lookahead in order to successfully parse the first part (the part in parens) knowing deterministically which branch you went through. (Something I this is a tremendously important part of the grammar even though you've already acknowledged you don't.) - - - - BTW, I don't understand why "*" is permitted at all in the case of just (#PCDATA) since if there are no other elements permitted. The point of a * is so that in (#PCDATA|Foo) you can do ..pcdata..<Foo>..foodata..</Foo>..pcdata..<Foo>..foodata..</Foo>..pcdata.. allowing repeated pcdatas or foos. But with (#PCDATA) the entire thing is just one big ..pcdata.. and there can't be two blocks of parsed character data as in ..pcdata....pcdata.. since there would be no uniquely identified point at which to make the division (without making the parse nondeterministic). If you were going to disallow the '*' in the #PCDATA-only case, you would DEFINITELY want to use my reformulation using "+" rather than "*" for the set containing the Names.
Received on Monday, 4 May 1998 22:59:55 UTC