Re: proposal for Issue #23 (relax requirement for NFC transcoding) from Julian Reschke on 2010-09-29 (public-iri@w3.org from September 2010)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Wed, 29 Sep 2010 18:14:31 +0200
To: "Phillips, Addison" <addison@lab126.com>
CC: "public-iri@w3.org" <public-iri@w3.org>
Message-ID: <4CA365E7.2030203@gmx.de>

On 29.09.2010 18:05, Phillips, Addison wrote:
> ...
> I propose the following changes.
>
> 1. Change the text above to read:
>
>     If the IRI or IRI reference is an octet stream in some known non-
>     Unicode character encoding, convert the IRI to a sequence of
>     characters from the UCS.
>
>     In other cases (written on paper, read aloud, or otherwise
>     represented independent of any character encoding) represent the IRI
>     as a sequence of characters from the UCS.
>
> 2. Add the following text just after the second paragraph above:
>
> NOTE: Some character encodings or transcriptions can be converted to or represented by more than one sequence of Unicode characters. Ideally the resulting IRI would use a normalized form, such as Unicode Normalization Form C (NFC, [UTR15]), since that ensures a stable, consistent representation that is most likely to produce the intended results. Implementers and users are cautioned that, while denormalized character sequences are valid, they might be difficult for other users or processes to guess and might produce unexpected results.
> ...

+1

In particular, I think this matches what implementations actually do.

Best regards, Julian

Received on Wednesday, 29 September 2010 16:15:09 UTC