- From: Norm Tovey-Walsh <norm@saxonica.com>
- Date: Thu, 01 Sep 2022 10:14:44 +0100
- To: ixml <public-ixml@w3.org>
- Message-ID: <m24jxrjvei.fsf@saxonica.com>
Hi folks,
Someone asked me (off list) about making railroad diagrams for ixml
grammars. They pointed me to Gunther Rademacher’s online tool:
https://www.bottlecaps.de/rr/ui
That tool creates diagrams by parsing the W3C EBNF format. I did a
little hacking by hand and produced a couple of halfway interesting
diagrams.
The W3C EBNF doesn’t have anything like inclusions or exclusions or
Unicode character classes. I haven’t opened up the hood on the RR
diagram library to see how hard it would be to create proper shapes for
those constructs. Instead, I just fake them as strings.
["-.·‿⁀"; Nd; Mn]
becomes a literal string in the RR diagram:
INCL: "-.·‿⁀" | Nd | Mn
It’s not ideal, but I don’t think it’s *too* bad.
I’m also just ignoring all the marks in the EBNF because I’m not sure
how to represent them.
The obvious next step was to write an XSLT transformation to produce the
approximated EBNF. Here’s ixml.ebnf that you can paste into the RR tool
to see the diagrams:
ixml ::= (s prolog? rule (RS rule)* s)
s ::= (whitespace | comment ( whitespace | comment)*)?
RS ::= whitespace | comment ( whitespace | comment)*
whitespace ::= "INCL: Zs" | tab | lf | cr
tab ::= '#x9'
lf ::= '#xa'
cr ::= '#xd'
comment ::= ('{'(cchar | comment ( cchar | comment)*)? '}')
cchar ::= 'EXCL: "{}"'
prolog ::= (version s)
version ::= ('ixml' RS 'version' RS string s '.')
rule ::= (((mark s))? name s 'INCL: "=:"' s alts '.')
mark ::= 'INCL: "@^-"'
alts ::= alt (('INCL: ";|"' s) alt)*
alt ::= (term ((',' s) term)*)?
term ::= factor | option | repeat0 | repeat1
factor ::= terminal | nonterminal | insertion | ('(' s alts ')' s)
repeat0 ::= (factor('*' s) | ('**' s sep))
repeat1 ::= (factor('+' s) | ('++' s sep))
option ::= (factor '?' s)
sep ::= factor
nonterminal ::= (((mark s))? name s)
name ::= (namestart(namefollower ( namefollower)*)? )
namestart ::= 'INCL: "_" | L'
namefollower ::= namestart | 'INCL: "-.·‿⁀" | Nd | Mn'
terminal ::= literal | charset
literal ::= quoted | encoded
quoted ::= (((tmark s))? string s)
tmark ::= 'INCL: "^-"'
string ::= ('"' dchar ( dchar)* '"') | (''' schar ( schar)* ''')
dchar ::= "EXCL: '#22' | #xa | #xd" | ('"' '"')
schar ::= "EXCL: #22'#22 | #xa | #xd" | (''' ''')
encoded ::= (((tmark s))? '#' hex s)
hex ::= "INCL: [0-9] | [a-f] | [A-F]" ( "INCL: [0-9] | [a-f] | [A-F]")*
charset ::= inclusion | exclusion
inclusion ::= (((tmark s))? set)
exclusion ::= (((tmark s))? '~' s set)
set ::= ('[' s((member s) (('INCL: ";|"' s) (member s))*)? ']' s)
member ::= string | ('#' hex) | range | class
range ::= (from s '-' s to)
from ::= character
to ::= character
character ::= ('"' dchar '"') | (''' schar ''') | ('#' hex)
class ::= code
code ::= (capital letter? )
capital ::= "INCL: [A-Z]"
letter ::= "INCL: [a-z]"
insertion ::= ('+' s string | ('#' hex)s)
Be seeing you,
norm
--
Norm Tovey-Walsh
Saxonica
Received on Thursday, 1 September 2022 09:28:55 UTC