- From: Martin Duerst <duerst@it.aoyama.ac.jp>
- Date: Mon, 02 Jul 2007 17:38:53 +0900
- To: Bjoern Hoehrmann <derhoermi@gmx.net>, Chris Lilley <chris@w3.org>
- Cc: www-svg@w3.org, public-i18n-core@w3.org, public-iri@w3.org
I have assigned this issue the following id: transcodeNFC-103 (http://www.w3.org/International/iri-edit/#transcodeNFC-103). I have changed the wording in step 1.b. of Section 3.1. Mapping of IRIs to URIs from: b. If the IRI is in some digital representation (e.g., an octet stream) in some known non-Unicode character encoding, convert the IRI to a sequence of characters from the UCS normalized according to NFC. to: b. If the IRI is in some digital representation (e.g., an octet stream) in some known non-Unicode character encoding, convert the IRI to a sequence of characters from the UCS. The resulting sequence of characters SHOULD be normalized using NFC. Any comments welcome! Regards, Martin. At 21:12 05/06/20, Bjoern Hoehrmann wrote: > >* Chris Lilley wrote: >>I can see that this is potentially an issue in CSS, but for XML where the >>only two encodings guaranteed to work across XML parsers are UTF-8 and >>UTF-16, and where use of any other (non codepoint subset - declaring >>UTF-8 and then using US-ASCII is not relevant here) encoding has >>always required declaration of the encoding, this seems to be less of a >>problem. > >CSS 2.1 defines an encoding detection algorithm that is at least as >deterministic as the encoding detection algorithm of the referencing >format, that's not considerably different from XML. XML and CSS im- >plementations also implement more encodings in practise, so there is >not really much difference here. > >The problem is interoperability, existing implementations do not ever >normalize (in accordance with the relevant specifications) and it is >not well-defined when normalization is required to occur. RFC 3987 >assumes a static processing model where only a single textual data >object is involved. > >Processing in a dynamic environment where multiple such objects are >involved (e.g., an external script modifying the DOM tree of some other >document) processing is at best unclear. It is also not defined what a >"non-Unicode encoding" is and which revision of UAX #15 is to be applied >(the NFC form of a string may change with each Unicode update). > >XML 1.1 for example mentions UTF-8, UTF-16 and UTF-32 as Unicode >encodings, XML C14N mentions UTF-8, UTF-16, UTF-16BE, UTF-16LE, >UCS-2, and UCS-4, to mention just two possible definitions for the >term. > >And even if that is well-defined, the requirement is non-trivial to >implement in a sane manner due to both complexity issues as well as >footprint issues, a NFC normalizer is not a tiny piece of software. > >So we have existing implementations and new implementations that are >required to behave differently, revisions to Unicode that allow im- >plementations to behave differently over time, and good reasons to >ignore or misunderstand the requirement, which in practise means that >the requirement cannot be relied upon, which renders the requirement >obsolete. > >Good luck exiting CR with proper tests for this in the test suite... >-- >Bj$B‹S(Bn H$B‹I(Brmann $B%-(B mailto:bjoern@hoehrmann.de $B%-(B http://bjoern.hoehrmann.de >Weinh. Str. 22 $B%-(B Telefon: +49(0)621/4309674 $B%-(B http://www.bjoernsworld.de >68309 Mannheim $B%-(B PGP Pub. KeyID: 0xA4357E78 $B%-(B http://www.websitedev.de/ #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Received on Monday, 2 July 2007 08:45:51 UTC