Re: proposal for Issue #23 (relax requirement for NFC transcoding) from Bjoern Hoehrmann on 2010-09-30 (public-iri@w3.org from September 2010)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Thu, 30 Sep 2010 23:48:53 +0200
To: "Phillips, Addison" <addison@lab126.com>
Cc: "public-iri@w3.org" <public-iri@w3.org>
Message-ID: <f61aa65m98ud10s62c5kjeb97fnunnn351@hive.bjoern.hoehrmann.de>

* Phillips, Addison wrote:
>1. Change the text above to read:
>
>   If the IRI or IRI reference is an octet stream in some known non-
>   Unicode character encoding, convert the IRI to a sequence of
>   characters from the UCS.
>
>   In other cases (written on paper, read aloud, or otherwise
>   represented independent of any character encoding) represent the IRI
>   as a sequence of characters from the UCS.

IRIs are by definition a sequence of characters from the UCS. With the
requirement gone, I do not think there is a point in having this section
in the document.

>2. Add the following text just after the second paragraph above:
>
>NOTE: Some character encodings or transcriptions can be converted to or
>represented by more than one sequence of Unicode characters. Ideally the
>resulting IRI would use a normalized form, such as Unicode Normalization
>Form C (NFC, [UTR15]), since that ensures a stable, consistent
>representation that is most likely to produce the intended results.
>Implementers and users are cautioned that, while denormalized character
>sequences are valid, they might be difficult for other users or
>processes to guess and might produce unexpected results.

Normalization is already discussed in 5.3.2.2 "Character Normalization",
any discussion of it should be moved there if it's not already covered.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

Received on Thursday, 30 September 2010 23:36:09 UTC