- From: John Dziurlaj <john@turnout.rocks>
- Date: Sun, 4 May 2025 13:51:28 +0000
- To: "public-ixml@w3.org" <public-ixml@w3.org>
- Message-ID: <DS7PR20MB39994FE2B46DDB1E1D2A6054C28F2@DS7PR20MB3999.namprd20.prod.outlook.com>
I am trying to parse the definitions section of the SGML specification. Each definition starts with a clause number (e.g. 4.2), and can run across multiple lines. I can handle cases where a given definition is contained on a single line. However, when the number of lines varies, I am lost as to what to do.
The following iXML grammar:
definitions: definition,(-delimit,definition)*.
definition: clause,ws,name,ws,description,(-delimit,note)?.
clause: ["0"-"9"],".",["0"-"9"]+.
name: ~[":"]+,-":".
description: ~[#a;#d]+, ~["0"-"9"].
note: "NOTE",~[#a;#d]+.
-delimit: lf; cr.
-ws: -[Zs]; tab; lf; cr.
-tab: -#9.
-lf: -#a.
-cr: -#d.
Can handle definitions like:
4.63 control character: A character that controls the interpretation, presentation, or other processing of the characters that follow it; for example, a tab character.
But not like this:
4.61 contextually required element: An element that is not a contextually optional element and
a) whose generic identifier is the document type name; or
b) whose currently applicable model token is a contextually required token.
NOTE — An element could be neither contextually required nor contextually optional; for example, an element whose currently applicable model token is in an or group that has no inherently optional tokens.
4.62 contextually required token: A content token that
(definition partially omitted)
A line feed cannot be used to determine when a new definition begins; however, AFAIK there is no lookahead ability to check for the existence of a new clause (which always indicates a new definition).
Regards,
John Dziurłaj /d͡ʑurwaj/
Received on Sunday, 4 May 2025 13:52:15 UTC