example recommended for escaping ASCII vs non-ASCII

2.4 Escaped Characters

My interpretation of RFC 2396 is that URI characters in the ASCII
range (U+007F and lower), when written in %-escaped form, must use
ASCII as the basis for the escaping. For example, to embed
"copyright 2003" in a URI, but with a copyright symbol (U+00A9)
rather than the word "copyright", you would need to encode the
space (U+0020) as "%20", regardless of what you used to encode
the copyright symbol (most likely "%C2%A9", if UTF-8 is being used
as the basis for escaping the non-ASCII characters).

It appears that this is still the case in the new spec, but you
might want to provide an example to underscore this fairly important
point. It's easy to miss, and probably affects quite a few
implementations. The URI encode/decode functions in EXSLT were
recently updated, at my urging, to enforce this.

-Mike

Received on Friday, 26 September 2003 09:52:49 UTC