Parsing and lookahead

I am trying to parse the definitions section of the SGML specification. Each definition starts with a clause number (e.g. 4.2), and can run across multiple lines. I can handle cases where a given definition is contained on a single line. However, when the number of lines varies, I am lost as to what to do.

The following iXML grammar:

definitions: definition,(-delimit,definition)*.
  definition: clause,ws,name,ws,description,(-delimit,note)?.
      clause: ["0"-"9"],".",["0"-"9"]+.
        name: ~[":"]+,-":".
 description: ~[#a;#d]+, ~["0"-"9"].
        note: "NOTE",~[#a;#d]+.
    -delimit: lf; cr.
         -ws: -[Zs]; tab; lf; cr.
        -tab: -#9.
         -lf: -#a.
         -cr: -#d.

Can handle definitions like:

4.63 control character: A character that controls the interpretation, presentation, or other processing of the characters that follow it; for example, a tab character.

But not like this:

4.61 contextually required element: An element that is not a contextually optional element and
a) whose generic identifier is the document type name; or
b) whose currently applicable model token is a contextually required token.
NOTE — An element could be neither contextually required nor contextually optional; for example, an element whose currently applicable model token is in an or group that has no inherently optional tokens.
4.62 contextually required token: A content token that

(definition partially omitted)

A line feed cannot be used to determine when a new definition begins; however, AFAIK there is no lookahead ability to check for the existence of a new clause (which always indicates a new definition).

Regards,

John Dziurłaj /d͡ʑurwaj/

Received on Sunday, 4 May 2025 13:52:15 UTC