W3C home > Mailing lists > Public > uri@w3.org > September 2003

example recommended for escaping ASCII vs non-ASCII

From: by way of Martin Duerst <mike@skew.org>
Date: Fri, 26 Sep 2003 09:52:06 -0400
Message-Id: <4.2.0.58.J.20030926095201.0614d338@localhost>
To: uri@w3.org




2.4 Escaped Characters

My interpretation of RFC 2396 is that URI characters in the ASCII
range (U+007F and lower), when written in %-escaped form, must use
ASCII as the basis for the escaping. For example, to embed
"copyright 2003" in a URI, but with a copyright symbol (U+00A9)
rather than the word "copyright", you would need to encode the
space (U+0020) as "%20", regardless of what you used to encode
the copyright symbol (most likely "%C2%A9", if UTF-8 is being used
as the basis for escaping the non-ASCII characters).

It appears that this is still the case in the new spec, but you
might want to provide an example to underscore this fairly important
point. It's easy to miss, and probably affects quite a few
implementations. The URI encode/decode functions in EXSLT were
recently updated, at my urging, to enforce this.

-Mike
Received on Friday, 26 September 2003 09:52:49 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:25:06 UTC