- From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
- Date: Tue, 26 Jul 2022 08:07:21 -0600
- To: M Joel Dubinko <micah@dubinko.info>
- Cc: public-ixml@w3.org
M Joel Dubinko <micah@dubinko.info> writes: > In Earley notation, it’s common to use “dot notation”. For example if > the sequence a b c is partly parsed: a • b c. (It’s actually more > complicated than this for already-parsed symbols, but that’s possibly > not germane to this discussion.) Ultimately, in some fashion, the code > needs to hold a representation equivalent to dot notation. Yup. > How do you manage this state with repeat0 / repeat1 expressions? You > don’t know that you’re done until you try to grab the next item, and > it fails. > One option I that you don’t. These expressions can always be > represented with simpler rules by adding intermediate rules, as in the > implementation hints section. If this is your implementation path, I’d > love to hear any details. As far as I know, all the implementations except Aparecium translate the user-specified EBNF into an equivalent BNF using either the equivalences given in the spec or others, so as you conjecture they can use the conventional Earley items. > Otherwise, an expression like foo•+ looks odd, and feels more awkward > to express in code. For what it's worth, Aparecium turns each production rule into a finite state automaton, using the Gluschkov construction (as described by Anne Brüggemann-Klein in one of her papers); the location in the rule is then given by the identifier for the state, and in the Gluschkov automaton each basic symbol occurrence in the source regular expression is a state. Marking the state by putting a dot after the symbol is unambiguous, and does not look any odder to me that in conventional Earley notation, but ymmv. Michael -- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com
Received on Tuesday, 26 July 2022 14:21:12 UTC