- From: Mike Brown <mike@skew.org>
- Date: Fri, 26 Sep 2003 11:32:00 -0600 (MDT)
- To: uri@w3.org
I want to revise my suggestion for rewording section 2.1. In the last paragraph, I didn't follow my own advice! Also, there are a couple of other details that I want to address. 1. I missed a "character set". 2. Prior to the adoption of RFCs 2277 and 2718, protocols and URI schemes were free to mandate the use of encodings other than UTF-8 as the basis for %-escaping, or to not speak to the issue at all (HTTP being the most notorious example). This should be acknowledged when recommending UTF-8. 3. Link the first mention of escaping to section 2.4 (#escape). 4. Even though the reader probably can figure out what is meant, the recommended action to encode-then-escape can be difficult to follow to the letter. If you escape an octet, then you have a triplet of characters. So far, so good. But if you don't escape an octet, then you've got ...an octet. You might say to just use characters represented by the unescaped octets, but then this makes me think the whole example is redundant, saying, in effect, "to escape certain characters, encode them all so you know how to escape them, but then just escape the ones you need to." What's the point? Just drop this entirely. So, the last paragraph should be sufficient if it reads like this: In accordance with the trend toward UTF-8 [RFC2279] (see also [RFC2277] and [RFC2718]), when a URI scheme defines a component that represents textual data consisting of characters from the Unicode / ISO/IEC 10646 repertoire and does not mandate the use of some other encoding, we recommend using UTF-8 [RFC2279] to determine the octets used to escape [#escape] characters that are not in the unreserved set.
Received on Friday, 26 September 2003 13:31:57 UTC