Re: Line/string triage?

Clearly a use case for the proposed 'not' construction...

 chapterline: "Chapter ", ~[#a]+.
 nonchapter: !"Chapter ",  ~[#a]+.

Steven

On Friday 21 March 2025 04:05:45 (+01:00), David Birnbaum wrote:

> Dear ixml list,
> 
> Is there a ixml idiom for distinguishing lines of input according to whether they do vs do not begin with a specific multicharacter pattern? For example, given consecutive lines, some of which begin with the string “Chapter “, I’d like recognize <chapterLine> and <nonChapterLine> elements. The former are easy, but I struggle to define a pattern that says “sequence of characters only when the first eight are not the string ‘Chapter ‘“. That is, I don’t know how to match a nonChapterLine without also possibly matching a chapterLine. I can use ~[“C”] to say “any *single* character that isn’t “C”, but a nonChapterLine could begin with “C” (or, for that matter, with “Ch” or “Cha”, etc.). Assuming I can be confident that a nonChapterLine cannot begin with the eight-character sequence “Chapter “, is it possible to construct an unambiguous grammar that will distinguish the two types of lines?
> 
> I can think of workarounds, such as tagging only chapter lines and then managing the untagged stuff between them in a separate pipeline step. But is it possible to tag all lines of either type unambiguously with just a single ixml grammar? Thank you for any suggestions.
> 
> Sincerely,
> 
> David
> 

Received on Friday, 21 March 2025 08:22:44 UTC