- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Sat, 26 Feb 2022 11:01:39 -0700
- To: Norm Tovey-Walsh <norm@saxonica.com>
- Cc: public-ixml@w3.org
Norm Tovey-Walsh writes: >> Consider the following string. >> >> (1) S=a.a.a.a=a.a.a.a.=a.a.a.a='a'. >> >> If I have done my sums correctly, there are 27 ways to parse this string >> in accord with our current grammar for ixml grammars. > > My parser says only seven. (Attached below.) I have not compared our enumerations; my logic is that each of the three substrings 'a.a.a.a' can be parsed like "a. a.a.a" or like "a.a. a.a" or like "a. a.a.a". Three possibilities for three positions, 3**3 = 27. >> My first reaction is to think that this is a major problem (a four-alarm >> fire, as some say) that needs attention as soon as possible. Do others >> agree, or am I over-reacting? > It’s definitely ugly. In practice, you can remedy the problems by > putting a space in front of the periods that you are using for > punctuation. A name can’t have a space, so that definitively resolves > the ambiguity. I note that here the "you" is the grammar writer. This is just me, but if we have a choice I would rather we not be telling grammar writers "write the grammar according to the rules; if it doesn't work, jiggle it and try again". > On that basis, I think we could technically live with it. (Am I right in > my intuition that this kind of ambiguity always involves some unhygenic > symbols or rules?) You may be right. But I think not necessarily. S=a.a.a=c;a,a.a;a.a.a='a'.c='x'. As far as I can tell, there are four parses for this, two of which are clean: no undefined nonterminals, no unreachable nonterminals, no unproductive nonterminals, no duplicate definitions. > However, if we can find consensus to fix this, I would strongly prefer > that. > Off the top of my head, > - we could remove “.” from the name characters, > - we could use a different symbol for “end of rule”, or > - we could require whitespace before the rule-ending “.”. I think requiring whitespace between rules would also be a possibility. (It makes the grammar non-LL(1), which is tiresome, but not I think ambiguous.) Michael -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
Received on Saturday, 26 February 2022 18:01:59 UTC