- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Wed, 06 Jul 2022 09:54:56 -0600
- To: M Joel Dubinko <micah@dubinko.info>
- Cc: public-ixml@w3.org
Thank you for this. I haven't used PEGs myself (because it seems so hard to predict from the grmmar what language it defines - perhaps I am just tripped up by the resemblance to conventional grammars), so I am a novice. But it occurs to me to wonder: given an LL(1) grammar would a PEG parser and a conventional parser be guaranteed to recognize the same set of sentences? If so, then would an LL(1) version of the ixml grammar be of interest to potential users of PEG parsers? If PEG parsers are in fact unsuitable for recognizing arbitrary context-free grammars, then perhaps an LL(1) grammar for ixml would not in fact be helpful, since it would get you past one brick wall only to leave you looking at another. But if others would also be intereted in an LL(1) grammar for ixml, it might spur me to get one done. Michael M Joel Dubinko <micah@dubinko.info> writes: > In case anyone else wants to play around with this, I put this very rough draft up on GitHub. [1] > > Thanks to a nice WebAssembly hack, you can play with it in a browser without having to configure any Rust environment. Just go to https://pest.rs and paste in the grammar from ixml.pest into the “Grammar” box, and a sample > grammar in the Input box. > > I’ve been using a greatly simplified input, but feel free to put in all of ixml.ixml. :) > > With > ixml: s, prolog?, s . > Or > ixml: s, prolog?, rule++RS, s. > > In particular, play around with the order of the choices in the ’term’ rule, as mentioned previously. > > It’s possible to get into a state where the parser says “expected comment” — which I am still figuring out. > > j > > [1] https://github.com/mdubinko/hackles/blob/main/src/ixml.pest > > On Jul 5, 2022, at 11:10 PM, M Joel Dubinko <micah@dubinko.info> wrote: > > I’m probably miles behind the rest of you here, but I ran into an interesting problem trying to express the ixml grammar in Pest [1]. > > The | (vertical bar) operator in Pest is an ordered choice. It has an aspect of short-circuit evaluation, looking at each option left-to-right and upon finding a match, immediately succeeding out of the whole expression. This means that > rules like: > > -term: factor; > option; > repeat0; > repeat1. > If expressed as-is in Pest, like this: > term = _{ factor | option | repeat0 | repeat1 } > > Against a rule like the right-hand side of > Ixml: s, prolog? s . > > The ‘prolog' nonterminal will get picked up as a plain ‘factor’ every time, short-circuiting out the ‘option’ path (where the literal ‘?’ is referenced). I confirmed that changing the order of terms fixes this immediate issue, but there are > more complicated instances of this in the grammar. Particularly between terminal and nonterminal (tmark and mark share prefixes). > > This seems like it makes Pest unsuitable for this implementation, though I need to sleep on it before any final decisions. > > [1] https://pest.rs > [2] https://pest.rs/book/grammars/syntax.html#ordered-choice -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
Received on Wednesday, 6 July 2022 16:06:47 UTC