- From: Mark Davis <mark@macchiato.com>
- Date: Fri, 15 Nov 2002 13:16:09 -0800
- To: "Martin Duerst" <duerst@w3.org>
- Cc: <www-international@w3.org>, <mark@macchiato.com>
Received on Friday, 15 November 2002 16:16:49 UTC
One of the steps is the following:
3) Re-escape any octets that are not part of a strictly legal UTF-
8 octet sequence.
This needs to be clearer. Suppose you have the invalid sequence:
...<C2><C3><80>...
One could re-escape the entire sequence,
...%C2%C3%80...
or one could re-escape the minimal-length invalid sequences, preceding from right to left.
...%C2<C3><80>...
I assume that the latter is what is meant, but it should be clearer in the text of the clause. For that matter, any single octet above <7F> is invalid, so a perverse reading of the clause would require all of them to be escaped!
4) Re-escape all octets that in UTF-8 represent characters that
are not appropriate according to Section 5.1.
Should this not also say Section 4.1?
It is also unclear what to do with a sequence like %G1. Does it turn into %25G1?
Mark
Received on Friday, 15 November 2002 16:16:49 UTC