- From: Norm Tovey-Walsh <norm@saxonica.com>
- Date: Sun, 04 May 2025 15:50:21 +0100
- To: John Dziurlaj <john@turnout.rocks>
- Cc: "public-ixml@w3.org" <public-ixml@w3.org>
John Dziurlaj <john@turnout.rocks> writes: > I am trying to parse the definitions section of the SGML specification. Each definition starts with a clause number (e.g. 4.2), and can run across multiple lines. I can handle cases where a given definition is contained on a single line. However, when the number of lines varies, I am lost as to what to do. Is it the case that a clause begins with a number: 3.14 This is the start of a clause It can have lots of stuff in it NOTE maybe a note 3.15 This is the next clause… And you want to capture everything in each clause? Or is there more variation in the data? > description: ~[#a;#d]+, ~["0"-"9"]. This says a description is “an arbitrary number of characters that aren’t #a or #d followed by a character that isn’t 0-9”. Is that an attempt to exclude the next clause number? Given the presence of NOTEs, I’m not sure that’s going to be sufficient. > A line feed cannot be used to determine when a new definition begins; however, AFAIK there is no lookahead ability to check for the existence of a new clause (which always indicates a new definition). Indeed, there’s no lookahead. You can’t peek forward without consuming. Be seeing you, norm -- Norm Tovey-Walsh Saxonica
Received on Sunday, 4 May 2025 14:50:28 UTC