- From: Norm Tovey-Walsh <norm@saxonica.com>
- Date: Wed, 16 Feb 2022 18:32:52 +0000
- To: ixml <public-ixml@w3.org>
- Message-ID: <m2zgmqvemq.fsf@saxonica.com>
Hello world, Michael mentioned a BibTeX grammar in passing and I thought that might be interesting to look at. I expect it’s going to be quite difficult to get right because I imagine that TeX slips in there somewhere. But I thought I’d see if I could get started. I need more experience writing grammars to be able to understand how to debug them anyway. And I’ve already got myself confused. The good news is that my parser and Steven’s agree, so it probably is a grammar error and not an implementation bug. Here’s the grammar: bibtex: item+itemsep, s* . itemsep: s+ . -item: comment ; entry . comment: -"%", cchar* . entry: -"@", @type, s*, -'{', s*, @citekey, fields?, s* -'}' . type: -name . citekey: -name . -fields: -',', field+fsep . field: s*, name, s*, -'=', s*, value . value: quotedvalue; bracedvalue; atomicvalue . -quotedvalue: -'"', qvalue, -'"' . -bracedvalue: -'{', bvalue, -'}' . -atomicvalue: ["0"-"9"]+ . qvalue: qchar* . bvalue: bchar* . -fsep: s*, -',', s* . name: namestart, namefollower* . -cchar: ~[#a] . -qchar: ~['"'] . -bchar: ~['}'] . -namestart: ["_"; L] . -namefollower: namestart; ["-.·‿⁀"; Nd; Mn] . -s: (-[Zs]; -#9; -#d; -#a) . And here’s a toy file that parses: %C @Book{K, t = "a" } @Book{K, t = "a" } If you take the space out between the two entries, it fails. And I don’t see why. Sparing you the gorey details of the state chart, my parser says: <parse-failed xmlns:ixml="http://invisiblexml.org/NS" ixml:state="failed"> <last-token line="5" column="1" token-count="25">'@'</last-token> </parse-failed> Steven’s says: <!-- **** Parsing failed at line 5, position 1 @Book{K, ^ **** Permitted at this position: "}"; #9; #a; #d; [Zs]. **** I see: "@". --> <fail ixml:state="failed" xmlns:ixml="http://invisiblexml.org/NS"> <line>5</line> <pos>1</pos> <line>@Book{K,</line> <permitted>"}"; #9; #a; #d; [Zs]</permitted> </fail> (Aside: I need to see about peeking in the state chart to work out what I think could have come next, if I can. Steven has better diagnostics than I do.) The parsers agree about where the problem is. The fact that Steven’s parser says a “}” could come there suggests something else is consuming the “}” that closes the previous entry, but I don’t see what that can be. Please point at my grammar and laugh derisively in my direction :-) Be seeing you, norm -- Norm Tovey-Walsh Saxonica
Received on Wednesday, 16 February 2022 18:41:03 UTC