- From: David Birnbaum <djbpitt@gmail.com>
- Date: Tue, 28 Jan 2025 12:35:40 -0500
- To: graydonish@gmail.com
- Cc: "Liam R. E. Quin" <liam@fromoldbooks.org>, LdBeth <andpuke@foxmail.com>, ixml <public-ixml@w3.org>
- Message-ID: <CAP4v81rZmRm1Sv7VdoDRd6ewjM+jOjVhiUBJpVWu_PZyfhRhQg@mail.gmail.com>
Dear Graydon (cc public-ixml),
Thank you; your description of the distinction between "the plain text
conforms to a clear, consistent, and documented (or, at least,
documentable) structure" and "a human recognizes the structure of the plain
text but that structure is not represented in a clear, consistent, and
documented way, and must therefore be discovered, perhaps painfully" is
helpful.
Best,
David
On Tue, Jan 28, 2025 at 11:50 AM Graydon <graydonish@gmail.com> wrote:
> On Tue, Jan 28, 2025 at 12:40:04AM -0500, David Birnbaum scripsit:
> > This leaves me still wondering whether there are rules of thumb for
> > choosing between using regex (e.g., analyze-string()) and using ixml
> > when both are available.
>
> Have you got rules, or do you need rules?
>
> E.g., "this string is a citation conforming to a known set of written
> rules" or "I need to contract with my client that the generated text
> will conform to productions of an agreed grammar". (Or "these are 80
> column records with known fields".)
>
> iXML is good for those.
>
> If it's "the rules must be discovered", as in the sort of conversion
> project were you've used fifteen passes to walk source to target in
> comprehensible steps, iXML is NOT good for those. (In theory, yes,
> a grammar could be written, but the cognitive load to write it is not
> an especially practical undertaking.)
>
> I am really looking forward to being able to pass those citation strings
> to a grammar; having to tweeze them apart with regular expressions fails
> to delight.
>
> -- Graydon
>
> --
> Graydon Saunders | graydonish@fastmail.com
> Þæs oferéode, ðisses swá mæg.
> -- Deor ("That passed, so may this.")
>
Received on Tuesday, 28 January 2025 17:35:56 UTC