- From: Paul Hoffman / IMC <phoffman@imc.org>
- Date: Tue, 15 Apr 2003 19:48:05 -0700
- To: public-iri@w3.org
>The text in that paragraph read
>
> For example, for a document with a URI of
> http://www.example.org/r%C3%A9sum%C3%A9.html, it is possible to
> construct a corresponding IRI (in XML notation, see Section 1.4):
> http://www.example.org/résumé.html (é stands for the
> e-acute character, and is the UTF-8 encoded and escaped
> representation of that character). On the other hand, for a document
> with an URI of http://www.example.org/r%E9sum%E9.html, the escaped
> octets cannot be converted to actual characters in an IRI, because
> the escaping is based on iso-8859-1 rather than UTF-8.
>
>The text in parentheses should have read:
>
> (é stands for the e-acute character, and %C3%A9 is the UTF-8
> encoded and escaped representation of that character)
>
>I have fixed that in my internal copy. Do you think that this change
>helps you to understand the paragraph better?
Only a little. It still makes me think that you are talking about an encoding.
Look at the paragraph that precedes this one:
In cases and for pieces where an encoding other than UTF-8 is used,
and for raw binary data encoded in URIs (see [RFC2397]), the octets
have to be %-escaped. In these situations, the ability of IRIs to
directly represent a wide character repertoire cannot be used.
How do you know the encoding of the URI? How can you tell if it is
UTF-8 (and therefore convertible to an IRI) or something else?
Asked another way, if I'm writing an IRI converter, how do I know
that this is OK:
http://www.example.org/r%C3%A9sum%C3%A9.html
But this isn't:
http://www.example.org/r%E9sum%E9.html
Is is simply because the second one fails a UTF8-decode test? What
about characters from other encondings that have values that are the
same as valid UTF8 values?
--Paul Hoffman, Director
--Internet Mail Consortium
Received on Tuesday, 15 April 2003 22:48:20 UTC