W3C home > Mailing lists > Public > public-iri@w3.org > April 2003

Re: Some issues with the IRI document [legacyNFC-06]

From: Martin Duerst <duerst@w3.org>
Date: Wed, 16 Apr 2003 13:52:07 -0400
Message-Id: <>
To: Paul Hoffman / IMC <phoffman@imc.org>, public-iri@w3.org

At 19:48 03/04/15 -0700, Paul Hoffman / IMC wrote:

>>At 08:14 03/04/08 -0700, Paul Hoffman / IMC wrote:
>>>Technical issues:
>>>I do not understand the logic of having Variants (B) and (C) in step 1 
>>>in section 3.1. One is normalized, the other one isn't. Doesn't this 
>>>sound like a recipe for disaster? Why did you differentiate between 
>>>these two cases?
>>This is listed as issue
>>This is carefully based on the principle of early uniform normalization
>>as described in the W3C Character Model. The assumption is that
>>Unicode-based encodings are for the most part already in NFC
>>(and where they are not, this may be on purpose). However,
>>for non-Unicode encodings, normalization when converting is
>>sometimes necessary (the most obvious example is windows-1258,
>>for Vietnamese).
>I guess this goes back to my question from the previous message: how do 
>you know what encoding you are looking at?

The encoding issues in
(this is later in the archives, but I'm assuming this is the
message you are talking about) is on a different level than
the encoding we are talking here.

What we are talking about here is that e.g. you receive an email
from Vietnam encoded in windows-1258, and this email contains
an IRI with some Vietnamese characters. Then to convert this
IRI into an URI, you have to use variant B) of step 1) in section
3.1, which will apply NFC when converting to Unicode in order
to convert the decompositions that occur in windows-1258 into
precomposed characters before then converting into UTF-8 and
using %-escaping.

Regards,    Martin.
Received on Wednesday, 16 April 2003 15:09:02 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:14:29 UTC