- From: Mark Davis <mark@macchiato.com>
- Date: Fri, 15 Nov 2002 13:16:09 -0800
- To: "Martin Duerst" <duerst@w3.org>
- Cc: <www-international@w3.org>, <mark@macchiato.com>
Received on Friday, 15 November 2002 16:16:49 UTC
One of the steps is the following: 3) Re-escape any octets that are not part of a strictly legal UTF- 8 octet sequence. This needs to be clearer. Suppose you have the invalid sequence: ...<C2><C3><80>... One could re-escape the entire sequence, ...%C2%C3%80... or one could re-escape the minimal-length invalid sequences, preceding from right to left. ...%C2<C3><80>... I assume that the latter is what is meant, but it should be clearer in the text of the clause. For that matter, any single octet above <7F> is invalid, so a perverse reading of the clause would require all of them to be escaped! 4) Re-escape all octets that in UTF-8 represent characters that are not appropriate according to Section 5.1. Should this not also say Section 4.1? It is also unclear what to do with a sequence like %G1. Does it turn into %25G1? Mark
Received on Friday, 15 November 2002 16:16:49 UTC