- From: Martin Duerst <duerst@w3.org>
- Date: Tue, 19 Nov 2002 05:47:16 +0900
- To: "Mark Davis" <mark@macchiato.com>
- Cc: <www-international@w3.org>
Hello Mark, Many thanks for your comments. Some detail questions below. Looking forward to your feedback. At 13:16 02/11/15 -0800, Mark Davis wrote: >One of the steps is the following: > > 3) Re-escape any octets that are not part of a strictly legal UTF- > 8 octet sequence. >This needs to be clearer. Suppose you have the invalid sequence: > >...<C2><C3><80>... > >One could re-escape the entire sequence, > >...%C2%C3%80... > >or one could re-escape the minimal-length invalid sequences, preceding >from right to left. > >...%C2<C3><80>... > >I assume that the latter is what is meant, but it should be clearer in the >text of the clause. For that matter, any single octet above <7F> is >invalid, so a perverse reading of the clause would require all of them to >be escaped! My interpretation is that <C3><80> is a strictly legal UTF-8 sequence, and therefore the <C3> and <80> octets are part of a strictly legal UTF-8 octet sequence, and so only <C2> can be re-escaped. What would you propose to make this easier to understand? Would it be better to replace 'a' by 'any'? 3) Re-escape any octets that are not part of any strictly legal UTF-8 octet sequence. Or do you have another idea of how to make this clearer? > 4) Re-escape all octets that in UTF-8 represent characters that > are not appropriate according to Section 5.1. >Should this not also say Section 4.1? Good point. Done. >It is also unclear what to do with a sequence like %G1. Does it turn into >%25G1? That's not a legal URI, so it is not a legal input. So we should never get it. If we get it, it's not converted to an octet in step 2), and can therefore not be re-escaped. But maybe it would help to say clearly that the 're-escape' refers to those octets produced in step 2): 3) Re-escape any octets produced in step 2) that are not part of a/any strictly legal UTF-8 octet sequence. What do you think? Regards, Martin.
Received on Monday, 18 November 2002 15:57:31 UTC