- From: David Birnbaum <djbpitt@gmail.com>
- Date: Mon, 27 Jan 2025 17:34:29 -0500
- To: ixml <public-ixml@w3.org>
- Message-ID: <CAP4v81pbXqTFt0G+CJj_QpFOvFHpu7A+r8aov7fFPZMSfi3UoQ@mail.gmail.com>
Dear ixml list, I'm using ixml to tag a plain-text novel where chapters begin with a roman numeral, a dot, a space, and an upper-case title (which may include spaces and a few punctuation marks), e.g.: VI. FAKE TITLE FOR CHAPTER SIX That pattern is easy to model; the issue is that my model for a line of regular narrative text (which may contain all of those characters and more) overlaps with it. That is, a line of regular text may include all of the characters allowed in a chapter-title line, except that a line of regular text never begins with something that matches a roman numeral followed by a dot. To make matters more complicated, there is one embedded subsection, within a chapter, that has a sub-title that is all upper-case, but without the leading roman numeral, along the lines of: FAKE HEADING FOR SUBSECTION EMBEDDED INSIDE NUMBERED CHAPTER Chapter-title lines are preceded by four newlines and followed by two, which is a pattern that I might have been able to use except that it is also the case with the embedded-subsection title line. I can get from plain text to XML with pipelining (for that matter, I can do it with a pure-XSLT pipeline) because with pipelining I can tag just the chapter-title lines first and then go back and tag the rest, having taken the chapter-heading lines out of consideration on the first pass. And with ixml if I rely on the four newlines before and two newlines after both chapter-title lines and the subsection title line I can tag all of those the same way and then patch up the incorrect tagging of the embedded subsection with a separate, subsequent XSLT step. But … In the interest of learning How To Do Stuff with ixml, I'd like to understand whether it's possible to write an unambiguous ixml grammar to tag the document in a way that recognizes chapter-heading lines and does not confuse them with either regular text lines or the annoying embedded subsection header line. Is there an ixml idiom for this that I haven't learned yet, or am I asking ixml to do something it isn't designed to do? Thanks in advance for any clarification! Best, David (djbpitt@gmail.com)
Received on Monday, 27 January 2025 22:34:44 UTC