- From: Steven Pemberton <steven.pemberton@cwi.nl>
- Date: Fri, 21 Mar 2025 08:22:38 +0000
- To: public-ixml@w3.org
Clearly a use case for the proposed 'not' construction... chapterline: "Chapter ", ~[#a]+. nonchapter: !"Chapter ", ~[#a]+. Steven On Friday 21 March 2025 04:05:45 (+01:00), David Birnbaum wrote: > Dear ixml list, > > Is there a ixml idiom for distinguishing lines of input according to whether they do vs do not begin with a specific multicharacter pattern? For example, given consecutive lines, some of which begin with the string “Chapter “, I’d like recognize <chapterLine> and <nonChapterLine> elements. The former are easy, but I struggle to define a pattern that says “sequence of characters only when the first eight are not the string ‘Chapter ‘“. That is, I don’t know how to match a nonChapterLine without also possibly matching a chapterLine. I can use ~[“C”] to say “any *single* character that isn’t “C”, but a nonChapterLine could begin with “C” (or, for that matter, with “Ch” or “Cha”, etc.). Assuming I can be confident that a nonChapterLine cannot begin with the eight-character sequence “Chapter “, is it possible to construct an unambiguous grammar that will distinguish the two types of lines? > > I can think of workarounds, such as tagging only chapter lines and then managing the untagged stuff between them in a separate pipeline step. But is it possible to tag all lines of either type unambiguously with just a single ixml grammar? Thank you for any suggestions. > > Sincerely, > > David >
Received on Friday, 21 March 2025 08:22:44 UTC