- From: David Birnbaum <djbpitt@gmail.com>
- Date: Thu, 20 Mar 2025 23:05:45 -0400
- To: ixml <public-ixml@w3.org>
Dear ixml list, Is there a ixml idiom for distinguishing lines of input according to whether they do vs do not begin with a specific multicharacter pattern? For example, given consecutive lines, some of which begin with the string “Chapter “, I’d like recognize <chapterLine> and <nonChapterLine> elements. The former are easy, but I struggle to define a pattern that says “sequence of characters only when the first eight are not the string ‘Chapter ‘“. That is, I don’t know how to match a nonChapterLine without also possibly matching a chapterLine. I can use ~[“C”] to say “any *single* character that isn’t “C”, but a nonChapterLine could begin with “C” (or, for that matter, with “Ch” or “Cha”, etc.). Assuming I can be confident that a nonChapterLine cannot begin with the eight-character sequence “Chapter “, is it possible to construct an unambiguous grammar that will distinguish the two types of lines? I can think of workarounds, such as tagging only chapter lines and then managing the untagged stuff between them in a separate pipeline step. But is it possible to tag all lines of either type unambiguously with just a single ixml grammar? Thank you for any suggestions. Sincerely, David
Received on Friday, 21 March 2025 03:06:03 UTC