Re: Ixml implementation design review notes from C. M. Sperberg-McQueen on 2022-07-23 (public-ixml@w3.org from July 2022)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Sat, 23 Jul 2022 10:46:16 -0600
To: M Joel Dubinko <micah@dubinko.info>
Cc: public-ixml@w3.org
Message-ID: <87ilnn22b0.fsf@blackmesatech.com>

M Joel Dubinko <micah@dubinko.info> writes:

> Appreciate the comments, Michael.

> I should have mentioned, my focus at this early stage is building a
> strong foundation that I can incrementally push forward until it
> supports all of ixml. I’m mainly focused on getting core Earley
> parsing correct, and setting up fundamental data structures. (And
> grokking Rust!)

Understood.  I have similar hopes for a future Earley parser in another
programming language.

> I’ll respond to some of the comments below. But first, a general
> question: I know about the ixml test suite [1], but is there anything
> comparable for testing a bare Earley parser? (For example, given
> grammar X and input Y, you should expect a trace Z as follows...)

Not in the ixml test suite.  I have not looked at the web more
generally.

My understanding, for what it's worth, is that Coffeepot and jωiXML both
have ways of dumping their internal data structures (and Aparecium will
have a run-time option for doing so rsn, though currently I make that
happen by just adding an unparseable extra character to the input in
order to make the parse fail, so that Aparecium dumpts its Earley
items).  So you can at least get something.  Since I think both
Coffeepot and jωiXML translate ixml into a more restricted BNF syntax
internally, and Aparecium does not, there is likely to be some daylight
among the results on a given test.

> (And apologies if I missed it, but is there a zip or other way to
> conveniently download the whole ixml suite? Not ready for it yet, but
> still aiming to have something semi-presentable by Balisage)

It's not a separate github repo, but if you download the zip for the
Invisible XML ixml repo at

  https://github.com/invisiblexml/ixml

the tests/ directory has what you need.

>  ...

>  Am I right to read this as indicating you plan to level the differences
>  between
>
>   ['a'; 'b'; '0'-'9']
>
>  and
>
>   ('a'; 'b'; '0'-'9')

> There’s more structure to character “sets” than I’ve currently
> sketched out. Both of these would have an identical effect on shaping
> the allowed grammar, correct?

Yes, unless I made a mistake constructing the example.

-- 
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com

Received on Saturday, 23 July 2022 16:59:17 UTC