Re: Beginners' errors

Dear Steven (cc ixml),

Thank you for this posting, which is helpful for those of us who are
beginning to find our way around ixml. As I've been learning how to think
about grammars, I've found your tutorials and Norm's, linked from the main
https://invisiblexml.org/ page, especially valuable. The "Grammars are not
regular expressions" page in your advanced tutorial, and its explanation of
how greediness distinguishes regular expressions from ixml productions,
helped foreground a distinction that is an easy point of confusion or
misunderstanding for new users.

The hanayama exercise is mine, and I undertook it as a
modestly sized opportunity to practice using ixml and XProc. It's part of
https://github.com/djbpitt/ixml, which is just a sandbox for
experimentation. I had already modified the production for newline before
your posting this morning, but the parsers still report ambiguities, and
insofar as you close your posting with a generous offer to provide further
advice, I'd be grateful for whatever you're able and willing to share.

Sincerely,

David (Birnbaum, djbpitt@gmail.com)

On Fri, Jan 17, 2025 at 9:16 AM Steven Pemberton <steven.pemberton@cwi.nl>
wrote:

> To keep track of how my implementation is doing, I check its logs every
> now and then, to see if it is failing anywhere, and to get a feel for what
> people are doing.
>
> The most reoccurring beginner's error I see is not putting the top-level
> rule at the beginning of the grammar. (Also submitting pdf files instead of
> text files).
>
> However, another mistake I see is testing a grammar for the first time on
> a huge input file. Typically the grammar is (immensely) ambiguous, and the
> huge input either takes an inordinate amount of time, and they think it has
> failed, or it runs out of memory and really does fail.
>
> So advice: test your grammars on small amounts of input to smoke out the
> ambiguity errors, before running it on large input files.
>
>  And if the author of the "hanayama" grammar is reading this, newline* should
> be newline+, and then you'll get some output. (And if you want more
> advice, feel free to contact me).
>
> Steven
>

Received on Friday, 17 January 2025 16:12:06 UTC