Re: SVG12: IRI Processing rules and xlink:href from Addison Phillips on 2007-07-02 (public-iri@w3.org from July 2007)

From: Addison Phillips <addison@yahoo-inc.com>
Date: Mon, 02 Jul 2007 11:50:02 -0700
To: Bjoern Hoehrmann <derhoermi@gmx.net>
CC: Martin Duerst <duerst@it.aoyama.ac.jp>, public-i18n-core@w3.org, public-iri@w3.org
Message-ID: <468948DA.5040206@yahoo-inc.com>

[The following is a personal response.]

Bjoern Hoehrmann wrote:
> There should be no SHOULD, it's critical that applications get this
> right. Where normalization is necessary or beneficial, it should be
> applied to the text content before any IRI processing takes place.

I agree with this and am not sure why the change was made?

The problem with this edit is that step 1b. is now doing two
things where formerly it did only one thing. Previously, it specified
that the IRI be converted to a normalized Unicode character sequence
without specifying how that took place.

Now it specifies converting from the legacy encoding and *then*
(perhaps) normalize. It reduces the requirement for NFC from an inherent
MUST to an explicit SHOULD.

Now I understand that encoding converters may or may not produce a
sequence that is NFC. For example, mapping a sequence containing the
combining flavors of Japanese dakuten or handakuten characters (i.e. 
U+3099, U+309A) to Unicode from a Japanese encoding will result in a 
combining sequence in several converters I have handy. I think it 
acceptable and even smart not to require the transcoding process to be 
normalizing. However, that wasn't the requirement in 1b. Normalization 
could be applied outside the transcoding process and still be conformant 
with the old text.

So, I think this change is counter-productive. It would have been
better to say:

--
            b. If the IRI is in some digital representation (e.g., an
               octet stream) in some known non-Unicode character
               encoding, convert the IRI to a sequence of characters
               from the UCS normalized according to NFC. Note that not
               all transcoding processes produce normalized text and that
               normalization might need to be checked after transcoding
               or applied separately.
--

WRT Bjoern's note:

> Besides, this does not resolve disputes as to when step b) would apply
> at all.

I don't understand this comment, however. I'm not sure what disputes 
could arise here, since this section specifies a process for mapping
IRIs to URIs. It doesn't specify that any particular Unicode encoding 
(or that it use one at all), but it does require that the text be a 
sequence of characters in the Unicode character set. I note that the 
whole of XML, for example, is based on this exact same idea [1].

Addison

-- 
Addison Phillips
Globalization Architect -- Yahoo! Inc.
Chair -- W3C Internationalization Core WG

Internationalization is an architecture.
It is not a feature.

Received on Monday, 2 July 2007 18:50:27 UTC