Re: Ixml implementation design review notes

Appreciate the comments, Michael.

I should have mentioned, my focus at this early stage is building a strong foundation that I can incrementally push forward until it supports all of ixml. I’m mainly focused on getting core Earley parsing correct, and setting up fundamental data structures. (And grokking Rust!)

I’ll respond to some of the comments below. But first, a general question: I know about the ixml test suite [1], but is there anything comparable for testing a bare Earley parser? (For example, given grammar X and input Y, you should expect a trace Z as follows...)

(And apologies if I missed it, but is there a zip or other way to conveniently download the whole ixml suite? Not ready for it yet, but still aiming to have something semi-presentable by Balisage)


> Here and below you seem to use 'rule' to refer to what I would call
> expressions --

You’re 100% right. I’ve updated this in my local tree (and below). It seems a good idea to follow the terminology as closely as possible. That said, there is no ‘expression’ or ‘expr’ production. Maybe there should be, if only for clarity. (?)
> 
> Are you interning only literals and nonterminals, or will structures
> like Seq() and OneOf() also be interned, so that in a grammar like
> 
>  S = A, B.
>  A = 'a'; ('bc').
>  B = 'a'; ('bc').
> 
> the production rules for A and B will point to the same OneOf() object?
> I'm just wondering about the cost.

Well, at this moment it’s up to the caller to keep this straight. As in:

    let choice: ExprRef = grammar.add_oneof(…);
    grammar.add_rule(“A”, choice);
    grammar.add_rule(“B”, choice);

I’m trying to figure out if there would be any practical confusion resulting from getting this wrong.

> I think you may be missing insertions (of the form +"literal), unless
> you are expecting to treat them as LitChar().

Yes, missing that and several other things :)

> 
> Am I right to read this as indicating you plan to level the differences
> between
> 
>  ['a'; 'b'; '0'-'9']
> 
> and
> 
>  ('a'; 'b'; '0'-'9')

There’s more structure to character “sets” than I’ve currently sketched out. Both of these would have an identical effect on shaping the allowed grammar, correct?

Thanks!
-j

[1] https://invisiblexml.org/test-catalog/ <https://invisiblexml.org/test-catalog/> 

Received on Friday, 22 July 2022 17:18:42 UTC