Some of the suggestions in this issue seem to me to make sense; others do
not.
Our judgement may depend on what we think the purpose of the exercise is.
My goal was an ixml translation of the grammar in the RFC, with marks to
make the XML nicer (for some subjective judgement of 'niceness'). I did not
think the goal was to suggest improvements to the normative grammar in the
RFC.
Absolutely understood. But as I also said elsewhere, published syntaxes are
typically to define what is correct. while our aim is to expose structure.
The imperfect syntax of ihost in rfc3987 being a point in case.
I don't object in principle to a sample grammar that deviates in well
defined ways from the normative grammar for the language in question, but I
think it needs to be strongly motivated and the deviations clearly
explained. If we think, for example, that the ixml grammar would be more
useful if we made host and ihost unambiguous, or if ireg-name were defined
as
ireg-name = label ++ ".".
label = ...
or as
ireg-name = (sub-domain ** ".", ".")?, TLD.
sub-domain = label.
TLD = label.
-label = ...
then we can do so, but we need to explain (first to each other and then to
the public) why we think that's more helpful and what class of domain names
will be grammatical in the normative grammar but ungrammatical in ours, or
vice versa, and why we think deviating from the normative spec for those
domain names will probably not matter in practice. So far, I haven't seen
any reason to change my understanding of the goal of these grammars.
I think one principle of ixml supplied grammars should be: you don't need
to reparse any subtrees.
It would probably be better to make IRI-reference the start symbol for the
IRI grammar
Sounds good, then several other nonterminals become reachable (but still
not absolute-IRI, ipath, reserved, gen-delims, CR, DQUOTE, LF and SP.)
Steven