- From: by way of Martin Duerst <mike@skew.org>
- Date: Fri, 26 Sep 2003 09:52:06 -0400
- To: uri@w3.org
2.4 Escaped Characters My interpretation of RFC 2396 is that URI characters in the ASCII range (U+007F and lower), when written in %-escaped form, must use ASCII as the basis for the escaping. For example, to embed "copyright 2003" in a URI, but with a copyright symbol (U+00A9) rather than the word "copyright", you would need to encode the space (U+0020) as "%20", regardless of what you used to encode the copyright symbol (most likely "%C2%A9", if UTF-8 is being used as the basis for escaping the non-ASCII characters). It appears that this is still the case in the new spec, but you might want to provide an example to underscore this fairly important point. It's easy to miss, and probably affects quite a few implementations. The URI encode/decode functions in EXSLT were recently updated, at my urging, to enforce this. -Mike
Received on Friday, 26 September 2003 09:52:49 UTC