Cheap and cheerful railroad diagrams

Hi folks,

Someone asked me (off list) about making railroad diagrams for ixml
grammars. They pointed me to Gunther Rademacher’s online tool:

  https://www.bottlecaps.de/rr/ui

That tool creates diagrams by parsing the W3C EBNF format. I did a
little hacking by hand and produced a couple of halfway interesting
diagrams.

The W3C EBNF doesn’t have anything like inclusions or exclusions or
Unicode character classes. I haven’t opened up the hood on the RR
diagram library to see how hard it would be to create proper shapes for
those constructs. Instead, I just fake them as strings.

  ["-.·‿⁀"; Nd; Mn]

becomes a literal string in the RR diagram:

  INCL: "-.·‿⁀" | Nd | Mn

It’s not ideal, but I don’t think it’s *too* bad.

I’m also just ignoring all the marks in the EBNF because I’m not sure
how to represent them.

The obvious next step was to write an XSLT transformation to produce the
approximated EBNF. Here’s ixml.ebnf that you can paste into the RR tool
to see the diagrams:

ixml ::= (s prolog? rule (RS rule)* s)
s ::= (whitespace | comment ( whitespace | comment)*)? 
RS ::=  whitespace | comment ( whitespace | comment)* 
whitespace ::=  "INCL: Zs" | tab | lf | cr 
tab ::=  '#x9' 
lf ::=  '#xa' 
cr ::=  '#xd' 
comment ::= ('{'(cchar | comment ( cchar | comment)*)? '}')
cchar ::=  'EXCL: "{}"' 
prolog ::= (version s)
version ::= ('ixml' RS 'version' RS string s '.')
rule ::= (((mark s))? name s 'INCL: "=:"' s alts '.')
mark ::=  'INCL: "@^-"' 
alts ::=  alt (('INCL: ";|"' s) alt)* 
alt ::= (term ((',' s) term)*)? 
term ::=  factor | option | repeat0 | repeat1 
factor ::=  terminal | nonterminal | insertion | ('(' s alts ')' s)
repeat0 ::= (factor('*' s) | ('**' s sep))
repeat1 ::= (factor('+' s) | ('++' s sep))
option ::= (factor '?' s)
sep ::=  factor 
nonterminal ::= (((mark s))? name s)
name ::= (namestart(namefollower ( namefollower)*)? )
namestart ::=  'INCL: "_" | L' 
namefollower ::=  namestart | 'INCL: "-.·‿⁀" | Nd | Mn' 
terminal ::=  literal | charset 
literal ::=  quoted | encoded 
quoted ::= (((tmark s))? string s)
tmark ::=  'INCL: "^-"' 
string ::= ('"' dchar ( dchar)* '"') | (''' schar ( schar)* ''')
dchar ::=  "EXCL: '#22' | #xa | #xd" | ('"' '"')
schar ::=  "EXCL: #22'#22 | #xa | #xd" | (''' ''')
encoded ::= (((tmark s))? '#' hex s)
hex ::=  "INCL: [0-9] | [a-f] | [A-F]" ( "INCL: [0-9] | [a-f] | [A-F]")* 
charset ::=  inclusion | exclusion 
inclusion ::= (((tmark s))? set)
exclusion ::= (((tmark s))? '~' s set)
set ::= ('[' s((member s) (('INCL: ";|"' s) (member s))*)? ']' s)
member ::=  string | ('#' hex) | range | class 
range ::= (from s '-' s to)
from ::=  character 
to ::=  character 
character ::= ('"' dchar '"') | (''' schar ''') | ('#' hex)
class ::=  code 
code ::= (capital letter? )
capital ::=  "INCL: [A-Z]" 
letter ::=  "INCL: [a-z]" 
insertion ::= ('+' s string | ('#' hex)s)

                                        Be seeing you,
                                          norm

--
Norm Tovey-Walsh
Saxonica

Received on Thursday, 1 September 2022 09:28:55 UTC