- From: John Dziurlaj <john@turnout.rocks>
- Date: Sun, 4 May 2025 13:51:28 +0000
- To: "public-ixml@w3.org" <public-ixml@w3.org>
- Message-ID: <DS7PR20MB39994FE2B46DDB1E1D2A6054C28F2@DS7PR20MB3999.namprd20.prod.outlook.com>
I am trying to parse the definitions section of the SGML specification. Each definition starts with a clause number (e.g. 4.2), and can run across multiple lines. I can handle cases where a given definition is contained on a single line. However, when the number of lines varies, I am lost as to what to do. The following iXML grammar: definitions: definition,(-delimit,definition)*. definition: clause,ws,name,ws,description,(-delimit,note)?. clause: ["0"-"9"],".",["0"-"9"]+. name: ~[":"]+,-":". description: ~[#a;#d]+, ~["0"-"9"]. note: "NOTE",~[#a;#d]+. -delimit: lf; cr. -ws: -[Zs]; tab; lf; cr. -tab: -#9. -lf: -#a. -cr: -#d. Can handle definitions like: 4.63 control character: A character that controls the interpretation, presentation, or other processing of the characters that follow it; for example, a tab character. But not like this: 4.61 contextually required element: An element that is not a contextually optional element and a) whose generic identifier is the document type name; or b) whose currently applicable model token is a contextually required token. NOTE — An element could be neither contextually required nor contextually optional; for example, an element whose currently applicable model token is in an or group that has no inherently optional tokens. 4.62 contextually required token: A content token that (definition partially omitted) A line feed cannot be used to determine when a new definition begins; however, AFAIK there is no lookahead ability to check for the existence of a new clause (which always indicates a new definition). Regards, John Dziurłaj /d͡ʑurwaj/
Received on Sunday, 4 May 2025 13:52:15 UTC