- From: Norm Tovey-Walsh <norm@saxonica.com>
- Date: Thu, 01 Sep 2022 14:11:53 +0100
- To: Steven Pemberton <steven.pemberton@cwi.nl>
- Cc: public-ixml@w3.org
- Message-ID: <m2v8q7i5y1.fsf@saxonica.com>
Steven Pemberton <steven.pemberton@cwi.nl> writes:
> I see some errors, but it's a nice start!
> I think ONEOF is more readable than INCL, and NONEOF for EXCL
Okay. I’m still trying to decide if I think its worth trying to get
better support in the RR generator.
> Something goes wrong with repeat0 and repeat1
Fixed in my follow-up post, or still wrong?
> Something goes wrong with single quotes (see string as an
> example).
Yes. They seem to disappear entirely.
> You could split individual members of a character set, so that
> ["0"-"9"; "a"-"f"; "A"-"F"] appears as INCL "0"-"9"; INCL
> "a"-"f"; INCL "A"-"F". This might give a better looking
> diagram.
Perhaps. I should probably put quotes into the ranges too.
> The rule for insertion is wrong.
Indeed. Failed to put parens around the alts.
Here’s a better version:
ixml ::= (s prolog? rule (RS rule)* s)
s ::= ( (whitespace | comment))*
RS ::= ( (whitespace | comment))+
whitespace ::= "INCL: Zs" | tab | lf | cr
tab ::= '#x9'
lf ::= '#xa'
cr ::= '#xd'
comment ::= ("{" ( (cchar | comment))* "}")
cchar ::= "EXCL: {}"
prolog ::= (version s)
version ::= ("ixml" RS "version" RS string s ".")
rule ::= (( ((mark s)))? name s "INCL: =:" s alts ".")
mark ::= "INCL: @^-"
alts ::= alt ( (("INCL: ;|" s)) alt)*
alt ::= (term ( (("," s)) term)*)?
term ::= factor | option | repeat0 | repeat1
factor ::= terminal | nonterminal | insertion | ("(" s alts ")" s)
repeat0 ::= (factor (("*" s) | ("**" s sep)))
repeat1 ::= (factor (("+" s) | ("++" s sep)))
option ::= (factor "?" s)
sep ::= factor
nonterminal ::= (( ((mark s)))? name s)
name ::= (namestart (namefollower)* )
namestart ::= "INCL: _ | L"
namefollower ::= namestart | "INCL: -.·‿⁀ | Nd | Mn"
terminal ::= literal | charset
literal ::= quoted | encoded
quoted ::= (( ((tmark s)))? string s)
tmark ::= "INCL: ^-"
string ::= ('"' (dchar)+ '"') | ("'" (schar)+ "'")
dchar ::= 'EXCL: " | #xa | #xd' | ('"' '"')
schar ::= "EXCL: ' | #xa | #xd" | ("'" "'")
encoded ::= (( ((tmark s)))? "#" hex s)
hex ::= ('INCL: ["0"-"9"] | ["a"-"f"] | ["A"-"F"]')+
charset ::= inclusion | exclusion
inclusion ::= (( ((tmark s)))? set)
exclusion ::= (( ((tmark s)))? "~" s set)
set ::= ("[" s( ((member s)) ( (("INCL: ;|" s)) ((member s)))*)? "]" s)
member ::= string | ("#" hex) | range | class
range ::= (from s "-" s to)
from ::= character
to ::= character
character ::= ('"' dchar '"') | ("'" schar "'") | ("#" hex)
class ::= code
code ::= (capital letter? )
capital ::= 'INCL: ["A"-"Z"]'
letter ::= 'INCL: ["a"-"z"]'
insertion ::= ("+" s (string | ("#" hex))s)
Be seeing you,
norm
--
Norm Tovey-Walsh
Saxonica
Received on Thursday, 1 September 2022 13:24:09 UTC