Earley dot notation for X+ and X* from M Joel Dubinko on 2022-07-26 (public-ixml@w3.org from July 2022)

From: M Joel Dubinko <micah@dubinko.info>
Date: Mon, 25 Jul 2022 23:31:42 -0400
To: ixml <public-ixml@w3.org>
Message-Id: <43B64783-181C-4066-8909-FB6B212C9A70@dubinko.info>

Howdy y’all. Especially implementers.

Would love to compare notes.

In Earley notation, it’s common to use “dot notation”. For example if the sequence a b c is partly parsed: a • b c. (It’s actually more complicated than this for already-parsed symbols, but that’s possibly not germane to this discussion.) Ultimately, in some fashion, the code needs to hold a representation equivalent to dot notation.

How do you manage this state with repeat0 / repeat1 expressions? You don’t know that you’re done until you try to grab the next item, and it fails.
One option I that you don’t. These expressions can always be represented with simpler rules by adding intermediate rules, as in the implementation hints section. If this is your implementation path, I’d love to hear any details.

Otherwise, an expression like foo•+ looks odd, and feels more awkward to express in code.

I’m having a jolly time hacking on this parser. Just looking for clever ideas and/or inspiration.

Thanks! -j

Received on Tuesday, 26 July 2022 03:31:59 UTC