Re: Some issues with the IRI document [legacyNFC-06] from Martin Duerst on 2003-04-17 (public-iri@w3.org from April 2003)

From: Martin Duerst <duerst@w3.org>
Date: Thu, 17 Apr 2003 17:22:46 -0400
To: Paul Hoffman / IMC <phoffman@imc.org>, public-iri@w3.org
Message-Id: <4.2.0.58.J.20030417171927.03ec3350@localhost>

At 15:31 03/04/16 -0700, Paul Hoffman / IMC wrote:

>At 1:52 PM -0400 4/16/03, Martin Duerst wrote:
>>What we are talking about here is that e.g. you receive an email
>>from Vietnam encoded in windows-1258, and this email contains
>>an IRI with some Vietnamese characters. Then to convert this
>>IRI into an URI, you have to use variant B) of step 1) in section
>>3.1, which will apply NFC when converting to Unicode in order
>>to convert the decompositions that occur in windows-1258 into
>>precomposed characters before then converting into UTF-8 and
>>using %-escaping.
>
>OK, I think I understand, but let me ask to be clear. Are you saying that 
>you must know the encoding of the context that the IRI appears in? If so, 
>I didn't catch that fact, and it should probably be stated before the examples.

I have changed variant B) in step 1) of section 3.1 from

    If the IRI is in some digital representation
    (e.g. an octet stream) in some non-Unicode encoding:
    Convert the IRI to a sequence of characters from the UCS
    normalized according to NFC.

to

    If the IRI is in some digital representation
    (e.g. an octet stream) in some *known* non-Unicode encoding:
    Convert the IRI to a sequence of characters from the UCS
    normalized according to NFC.

(emphasis only here).

Do you think that this helps, or do you think that other changes
are needed?

Regards,    Martin.

Received on Thursday, 17 April 2003 17:42:25 UTC