A BibTeX grammar

Hello world,

Michael mentioned a BibTeX grammar in passing and I thought that might
be interesting to look at. I expect it’s going to be quite difficult to
get right because I imagine that TeX slips in there somewhere.

But I thought I’d see if I could get started. I need more experience
writing grammars to be able to understand how to debug them anyway.

And I’ve already got myself confused. The good news is that my parser
and Steven’s agree, so it probably is a grammar error and not an
implementation bug.

Here’s the grammar:

bibtex: item+itemsep, s* .
itemsep: s+ .

-item: comment ; entry .

comment: -"%", cchar* .

entry: -"@", @type, s*, -'{', s*, @citekey, fields?, s* -'}' .

type: -name .
citekey: -name .

-fields: -',', field+fsep .

field: s*, name, s*, -'=', s*, value .
value: quotedvalue; bracedvalue; atomicvalue .

-quotedvalue: -'"', qvalue, -'"' .
-bracedvalue: -'{', bvalue, -'}' .
-atomicvalue: ["0"-"9"]+ .

qvalue: qchar* .
bvalue: bchar* .

-fsep: s*, -',', s* .

name: namestart, namefollower* .

-cchar: ~[#a] .
-qchar: ~['"'] .
-bchar: ~['}'] .
-namestart: ["_"; L] .
-namefollower: namestart; ["-.·‿⁀"; Nd; Mn] .

-s: (-[Zs]; -#9; -#d; -#a) .

And here’s a toy file that parses:

%C
@Book{K,
  t = "a"
}

@Book{K,
  t = "a"
}

If you take the space out between the two entries, it fails. And I don’t
see why. Sparing you the gorey details of the state chart, my parser
says:

<parse-failed xmlns:ixml="http://invisiblexml.org/NS" ixml:state="failed">
   <last-token line="5" column="1" token-count="25">'@'</last-token>
</parse-failed>

Steven’s says:

<!--
  **** Parsing failed at line 5, position 1
  @Book{K,
  ^
  **** Permitted at this position: "}"; #9; #a; #d; [Zs].
  **** I see: "@".
-->
<fail ixml:state="failed" xmlns:ixml="http://invisiblexml.org/NS">
   <line>5</line>
   <pos>1</pos>
   <line>@Book{K,</line>
   <permitted>"}"; #9; #a; #d; [Zs]</permitted>
</fail>

(Aside: I need to see about peeking in the state chart to work out what
I think could have come next, if I can. Steven has better diagnostics
than I do.)

The parsers agree about where the problem is. The fact that Steven’s
parser says a “}” could come there suggests something else is consuming
the “}” that closes the previous entry, but I don’t see what that can
be.

Please point at my grammar and laugh derisively in my direction :-)

                                        Be seeing you,
                                          norm

--
Norm Tovey-Walsh
Saxonica

Received on Wednesday, 16 February 2022 18:41:03 UTC