Ambiguity (what else!?) question

Dear ixml list,

I'm using ixml to tag a plain-text novel where chapters begin with a roman
numeral, a dot, a space, and an upper-case title (which may include spaces
and a few punctuation marks), e.g.:

VI. FAKE TITLE FOR CHAPTER SIX

That pattern is easy to model; the issue is that my model for a line of
regular narrative text (which may contain all of those characters and more)
overlaps with it. That is, a line of regular text may include all of the
characters allowed in a chapter-title line, except that a line of regular
text never begins with something that matches a roman numeral followed by a
dot.

To make matters more complicated, there is one embedded subsection, within
a chapter, that has a sub-title that is all upper-case, but without the
leading roman numeral, along the lines of:

FAKE HEADING FOR SUBSECTION EMBEDDED INSIDE NUMBERED CHAPTER

Chapter-title lines are preceded by four newlines and followed by two,
which is a pattern that I might have been able to use except that it is
also the case with the embedded-subsection title line.

I can get from plain text to XML with pipelining (for that matter, I can do
it with a pure-XSLT pipeline) because with pipelining I can tag just the
chapter-title lines first and then go back and tag the rest, having taken
the chapter-heading lines out of consideration on the first pass. And with
ixml if I rely on the four newlines before and two newlines after both
chapter-title lines and the subsection title line I can tag all of those
the same way and then patch up the incorrect tagging of the embedded
subsection with a separate, subsequent XSLT step. But …

In the interest of learning How To Do Stuff with ixml, I'd like to
understand whether it's possible to write an unambiguous ixml grammar to
tag the document in a way that recognizes chapter-heading lines and does
not confuse them with either regular text lines or the annoying embedded
subsection header line. Is there an ixml idiom for this that I haven't
learned yet, or am I asking ixml to do something it isn't designed to do?

Thanks in advance for any clarification!

Best,

David (djbpitt@gmail.com)

Received on Monday, 27 January 2025 22:34:44 UTC