- From: Martin Duerst <duerst@it.aoyama.ac.jp>
- Date: Tue, 03 Jul 2007 10:44:00 +0900
- To: Addison Phillips <addison@yahoo-inc.com>, Bjoern Hoehrmann <derhoermi@gmx.net>
- Cc: public-i18n-core@w3.org, public-iri@w3.org
At 03:50 07/07/03, Addison Phillips wrote: >[The following is a personal response.] > >Bjoern Hoehrmann wrote: >> There should be no SHOULD, it's critical that applications get this >> right. Where normalization is necessary or beneficial, it should be >> applied to the text content before any IRI processing takes place. > >I agree with this and am not sure why the change was made? > >The problem with this edit is that step 1b. is now doing two >things where formerly it did only one thing. Previously, it specified >that the IRI be converted to a normalized Unicode character sequence >without specifying how that took place. Well, the split was necessary to change from MUST to SHOULD. >Now it specifies converting from the legacy encoding and *then* >(perhaps) normalize. It reduces the requirement for NFC from an inherent >MUST to an explicit SHOULD. Yes. The way I understand it, this was to address concerns brought up by the CSS WG that in some cases, they won't even know what the original encoding was, because a whole CSS file has been transcoded to Unicode, and the original encoding thrown away, and it would be a bad hack to somehow keep the encoding around and do something encoding-dependent long after everything is in Unicode. >Now I understand that encoding converters may or may not produce a >sequence that is NFC. For example, mapping a sequence containing the >combining flavors of Japanese dakuten or handakuten characters (i.e. U+3099, U+309A) to Unicode from a Japanese encoding will result in a combining sequence in several converters I have handy. I think it acceptable and even smart not to require the transcoding process to be normalizing. However, that wasn't the requirement in 1b. Normalization could be applied outside the transcoding process and still be conformant with the old text. Yes, but as far as I understand, that wasn't the original issue. Also, given the above fact, my guess is that apart from the above issue, implementation conformance to the normalization requirement in 1.b. is spotty at best. It looks like this area of the spec really needs some more work, so please keep your comments comming. Regards, Martin. >So, I think this change is counter-productive. It would have been >better to say: > >-- > b. If the IRI is in some digital representation (e.g., an > octet stream) in some known non-Unicode character > encoding, convert the IRI to a sequence of characters > from the UCS normalized according to NFC. Note that not > all transcoding processes produce normalized text and that > normalization might need to be checked after transcoding > or applied separately. >-- > >WRT Bjoern's note: > >> Besides, this does not resolve disputes as to when step b) would apply >> at all. > >I don't understand this comment, however. I'm not sure what disputes could arise here, since this section specifies a process for mapping >IRIs to URIs. It doesn't specify that any particular Unicode encoding (or that it use one at all), but it does require that the text be a sequence of characters in the Unicode character set. I note that the whole of XML, for example, is based on this exact same idea [1]. > >Addison > >-- >Addison Phillips >Globalization Architect -- Yahoo! Inc. >Chair -- W3C Internationalization Core WG > >Internationalization is an architecture. >It is not a feature. > > #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Received on Tuesday, 3 July 2007 02:24:12 UTC