- From: Norm Tovey-Walsh <norm@saxonica.com>
- Date: Mon, 17 Apr 2023 09:59:59 +0100
- To: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
- Cc: public-ixml@w3.org
- Message-ID: <m2fs8y50co.fsf@saxonica.com>
> * In general, start with ixml.
>
> Even if the structure of control sequences is as simple as the regex
> '\|[^\|]+\|', use ixml, not an editor or a transform with a bunch of
> regex matches. Ixml provides a record of the transformation (much
> better than "oh, it was a couple of hours' work in Emacs"). It
> makes it easy to handle different control sequences differently, for
> whatever reason.
It is very unfortunate that the diagram on page 22 uses vertical bars
and those are manifest in the data:
0 1. |w30||wid||picn| *
0 2. |n||e|#|e|###############'exponent#part' |g|?
0 3. |e|################|ul4||ul||ul||ulb||ul4||ul||ul| |g|?
0 4. |e|###############|#############| |g|?
0 5. |e|######'times#ten#to#the###'power#of#ten' |g|?
0 6. |e|########power#choice'########|ul||ul||ul||ulb||ul||ul||ul| |g|?
0 7. |e|#############|##############|#######| |g|?
0 8. |e|#############|######'plusminus###'fixed#point |g|?
0 9. |e|#############|########option'######numeral' |g|?
0 10. |e|#############|#########|###############| |g|?
And then there’s this bit of weirdness just a little further on:
0 101. <L2F2>|S25|S5'|S1'times#ten#to#the|S122|S5'|S1'power#of#ten|S5'|S1' *
0 102. <LF2>|S40power#choice|S5'|S1' *
0 103. <LF2>|S204|S5'|S1'plusminus|S130|S5'|S1'fixed#point *
0 104. <LF2>|S223option|S5'|S1'|S159numeral|S5'|S1' *
> If anyone reading this has further advice, or different advice, I would
> be glad to hear it.
What you outline sounds plausible to me. Using a record-oriented parse,
I was able to get simple output very quickly. I struggled to write a
grammar for extracting the control sequences that wasn’t ambiguous. I’m
out of practice, I guess.
A quick python program to identify what all the unique control sequence
*are* tripped over the vertical bars in the diagram and whatever the
heck is going on in the lines that begin <Lxxx>. And the day job
beckons.
Be seeing you,
norm
--
Norm Tovey-Walsh
Saxonica
Received on Monday, 17 April 2023 09:23:16 UTC