Announcing draft-duerst-iri-02.txt

One of the steps is the following:

       3) Re-escape any octets that are not part of a strictly legal UTF-
          8 octet sequence.

This needs to be clearer. Suppose you have the invalid sequence:

...<C2><C3><80>...

One could re-escape the entire sequence, 

...%C2%C3%80...

or one could re-escape the minimal-length invalid sequences, preceding from right to left. 

...%C2<C3><80>...

I assume that the latter is what is meant, but it should be clearer in the text of the clause. For that matter, any single octet above <7F> is invalid, so a perverse reading of the clause would require all of them to be escaped!

       4) Re-escape all octets that in UTF-8 represent characters that
          are not appropriate according to Section 5.1.

Should this not also say Section 4.1?

It is also unclear what to do with a sequence like %G1. Does it turn into %25G1?

Mark

Received on Friday, 15 November 2002 16:16:49 UTC