Re: Pest and ordered choices

On Wednesday 06 July 2022 05:10:14 (+02:00), M Joel Dubinko wrote:


I’m probably miles behind the rest of you here, but I ran into an interesting problem trying to express the ixml grammar in Pest [1].


The | (vertical bar) operator in Pest is an ordered choice. It has an aspect of short-circuit evaluation, looking at each option left-to-right and upon finding a match, immediately succeeding out of the whole expression. This means that rules like:


        -term: factor;
               option;
               repeat0;
               repeat1.
If expressed as-is in Pest, like this:
        term = _{ factor | option | repeat0 | repeat1 }


Against a rule like the right-hand side of
        Ixml: s, prolog? s .


The ‘prolog' nonterminal will get picked up as a plain ‘factor’ every time, short-circuiting out the ‘option’ path (where the literal ‘?’ is referenced). I confirmed that changing the order of terms fixes this immediate issue, but there are more complicated instances of this in the grammar. Particularly between terminal and nonterminal (tmark and mark share prefixes).


This seems like it makes Pest unsuitable for this implementation, though I need to sleep on it before any final decisions.


Until we introduced the prolog, I think I'm right in saying that ixml had a 1 character lookahead. It is left-to-right, but in order to make it top-down (i.e. LL1, which I think Pest is a subset of), you have to know which character starts each alternative. All the alternatives of term begin with a factor, so to make it LL1, you'd have to refactor the grammar (sorry about that) to something like


 term: factor, ("?"; "*"; "+"; "++", sep; **, sep).


but then you'd get a different parse tree.


We definitely took advantage of Earley-like parsing rules when designing ixml in order to generate a decent-looking XML parse-tree, and that then restricts the sort of parsing you can do. (I'm already regretting the prolog choice.)


Steven




[1] https://pest.rs
[2] https://pest.rs/book/grammars/syntax.html#ordered-choice

Received on Monday, 25 July 2022 17:08:44 UTC