- From: Fredrik Öhrström <oehrstroem@gmail.com>
- Date: Tue, 3 Mar 2026 13:27:02 +0100
- To: ixml <public-ixml@w3.org>
- Message-ID: <CALZT+jDKf8C_xHLej_jj2L3SH=rV+bZJN4R+QRMVemswKCYY0g@mail.gmail.com>
I see two types of ambiguity. Lexical and Grammar, but they both show up as an ambiguous parse in ixml. If we can find the same technical solution for both then marvellous, but I am inclined to think that we need two different technical solutions to address them. The whitespace ambiguity and eat the whole token please (do not try all possible splits i+nt in+t int) belong to the same lexical ambiguity that is normally handled by a tokenizer which is greedy. My "not charset" implementation that takes a rule such as: name = [L]+, ![L]. will prevent the earley parser o complete the rule unless there is no more letter to accumulate. I.e. it becomes greedy. I think everyone would love to define whitespace as s = [' ';#10]+, ![' ';#10]. I think this would resolve a lot of whitespace ambiguity problems that we experience. Having a "not rule" does not make sense to me. I cannot see how to implement it either. Then we have the grammar ambiguity, for example the dangling else problem. And then we have a new grammar ambiguity that was handled by the tokenizer, ie reserved keywords translate into a unique token whereas all other strings translate into for example a variable_name token. John, dealt with this problem by creating the subtraction operator. > The 'subtraction' operator I implemented (A¬B) was perhaps more restricted and designed to solve a specific problem in 'reserved keywords' in the XPath grammars. I implement instead a cost value for a rule instead. Normally all rules have the same cost and the resulting tree is random of the possible ambiguous matching rules. But if a rule has higher cost than the others, it will not be picked. Thus you give the catchall rule a higher cost. Johns example above would be written in my tool as: A =< .... Where the < directly after the = means this choice has a higher cost, pick other matching rule that have a lower cost if they exist. So, two basic problems that give rise to ixml ambiguity warnings: lexical (solved with greed) grammar (solved with cost or subtraction) //Fredrik
Received on Tuesday, 3 March 2026 12:27:33 UTC