Re: draft-duerst-iri-07.txt: 2 week mailing list last call (IRIsyntax-28)

Hello Graham,

Many thanks for your comments. I'm responding in pieces,
as I split things up into issues. This is issue

At 12:02 04/05/10 +0100, Graham Klyne wrote:

>These comments are based on a quick skim rather than a detailed reading.
>Looking at this from an implementer's perspective, I feel it would be 
>helpful if the relationship between the IRI and URI *grammars* were more 
>clearly delineated;  e.g. a presentation of IRI syntax that is based on 
>the RFC2396bis grammar,

It definitely is. I have very carefully followed all the changes
in the RFC2396bis grammar. I very much hope I got it right, but
any additional crosschecks would be greatly appreciated.

>replacing a minimum number of productions.

I think that's also the case.

I think what you mean is to only change some terminal productions,
but leave the rest unchanged. There are several reasons why this
hasn't been done:

- In the most general sense, it would have been okay to just
   add ucschar to unreserved. But there is the issue of allowing
   private characters in query parts, but not elsewhere.
- Looking top down, what you seem to want may actually imply
   to use 'URI' rather than 'IRI' at the start of the grammar.
   I think that would have been more confusing than helpful.
   And once you start making these distinctions, it would
   then again be confusing to e.g. use 'path' rather than
- The current solution allows to very easily create a combined
   grammar of URIs and IRIs, which some people might want to do.
   If the two grammars would use the same symbols for things that
   are on  purpose the same, but in actual syntax are different,
   that wouldn't work.
- The grammar allows somebody who wants to only implement IRIs
   to do that directly. It may lead to less implementation
   divergence than a verbal description of differences against
   another spec.

>On this basis, it would be easier to see what needs to be changed in a URI 
>parser to yield an IRI parser.

I hope it's already easy enough to see. I have used very
straightforward naming conventions to correlate the corresponding
nonterminals in both grammars. If there is 'URI' in the original
nonterminal, I have changed that to 'IRI'. Otherwise, I have
prefixed an 'i'. If you think this is helpful, I can add a
sentence to that effect.

>Also, I note that the RFC2396bis grammar has been through several 
>revisions as subtle issues are exposed by review and implementation 
>experience;  by replicating the entire grammar (rather than saying that an 
>IRI is like a URI with designated changes), can you be confident that such 
>issues have not been re-introduced?

Of course, there is never any absolute confidence, but as far as
I remember, these issues were all related to details around
special cases such as e.g. empty paths. They are therefore
orthogonal to the issue of extending the repertoire of
'reserved' characters. One might be able to immagine some
interactions if e.g. authorities and paths were extended
in a different way, but the difference is for query parts,
which are very clearly delineated, and haven't been at
issue in the bugs you mention above.

I hope you are satisfied with this answer. If not, I would
appreciate a more detailled proposal.

Regards,   Martin.

Received on Tuesday, 11 May 2004 23:50:11 UTC