- From: Paul Hoffman / IMC <phoffman@imc.org>
- Date: Tue, 15 Apr 2003 19:48:05 -0700
- To: public-iri@w3.org
>The text in that paragraph read > > For example, for a document with a URI of > http://www.example.org/r%C3%A9sum%C3%A9.html, it is possible to > construct a corresponding IRI (in XML notation, see Section 1.4): > http://www.example.org/résumé.html (é stands for the > e-acute character, and is the UTF-8 encoded and escaped > representation of that character). On the other hand, for a document > with an URI of http://www.example.org/r%E9sum%E9.html, the escaped > octets cannot be converted to actual characters in an IRI, because > the escaping is based on iso-8859-1 rather than UTF-8. > >The text in parentheses should have read: > > (é stands for the e-acute character, and %C3%A9 is the UTF-8 > encoded and escaped representation of that character) > >I have fixed that in my internal copy. Do you think that this change >helps you to understand the paragraph better? Only a little. It still makes me think that you are talking about an encoding. Look at the paragraph that precedes this one: In cases and for pieces where an encoding other than UTF-8 is used, and for raw binary data encoded in URIs (see [RFC2397]), the octets have to be %-escaped. In these situations, the ability of IRIs to directly represent a wide character repertoire cannot be used. How do you know the encoding of the URI? How can you tell if it is UTF-8 (and therefore convertible to an IRI) or something else? Asked another way, if I'm writing an IRI converter, how do I know that this is OK: http://www.example.org/r%C3%A9sum%C3%A9.html But this isn't: http://www.example.org/r%E9sum%E9.html Is is simply because the second one fails a UTF8-decode test? What about characters from other encondings that have values that are the same as valid UTF8 values? --Paul Hoffman, Director --Internet Mail Consortium
Received on Tuesday, 15 April 2003 22:48:20 UTC