Re: Earley dot notation for X+ and X* from Steven Pemberton on 2022-07-26 (public-ixml@w3.org from July 2022)

From: Steven Pemberton <steven.pemberton@cwi.nl>
Date: Tue, 26 Jul 2022 12:01:06 +0000
To: M Joel Dubinko <micah@dubinko.info>, ixml <public-ixml@w3.org>
Message-Id: <1658836610669.4221473774.2455277456@cwi.nl>

Actually, the spec gives a hint to what I do: I rewrite the rules to pure alternative form.

 http://invisiblexml.org/1.0/#hints



Optional factor:

f? ⇒ f-option
-f-option: f; ().

Zero or more repetitions:

f* ⇒ f-star
-f-star: (f, f-star)?.

One or more repetitions:

f+ ⇒ f-plus
-f-plus: f, f*.

One or more repetitions with separator:

f++sep ⇒ f-plus-sep
-f-plus-sep: f, (sep, f)*.

Zero or more repetitions with separator:

f**sep ⇒ f-star-sep
-f-star-sep: (f++sep)?.


Steven

On Tuesday 26 July 2022 05:31:42 (+02:00), M Joel Dubinko wrote:

> Howdy y’all. Especially implementers.
>
> Would love to compare notes.
>
> In Earley notation, it’s common to use “dot notation”. For example if the sequence a b c is partly parsed: a • b c. (It’s actually more complicated than this for already-parsed symbols, but that’s possibly not germane to this discussion.) Ultimately, in some fashion, the code needs to hold a representation equivalent to dot notation.
>
> How do you manage this state with repeat0 / repeat1 expressions? You don’t know that you’re done until you try to grab the next item, and it fails.
> One option I that you don’t. These expressions can always be represented with simpler rules by adding intermediate rules, as in the implementation hints section. If this is your implementation path, I’d love to hear any details.
>
> Otherwise, an expression like foo•+ looks odd, and feels more awkward to express in code.
>
> I’m having a jolly time hacking on this parser. Just looking for clever ideas and/or inspiration.
>
> Thanks! -j
>
>
>

Received on Tuesday, 26 July 2022 12:01:41 UTC