Re: SVG12: IRI Processing rules and xlink:href from Martin Duerst on 2007-07-02 (www-svg@w3.org from July 2007)

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Mon, 02 Jul 2007 17:38:53 +0900
To: Bjoern Hoehrmann <derhoermi@gmx.net>, Chris Lilley <chris@w3.org>
Cc: www-svg@w3.org, public-i18n-core@w3.org, public-iri@w3.org
Message-Id: <6.0.0.20.2.20070702173555.0446b400@localhost>

I have assigned this issue the following id: transcodeNFC-103
(http://www.w3.org/International/iri-edit/#transcodeNFC-103).

I have changed the wording in step 1.b. of Section
3.1.  Mapping of IRIs to URIs from:

            b. If the IRI is in some digital representation (e.g., an
               octet stream) in some known non-Unicode character
               encoding, convert the IRI to a sequence of characters
               from the UCS normalized according to NFC.

to:

            b. If the IRI is in some digital representation (e.g., an
               octet stream) in some known non-Unicode character
               encoding, convert the IRI to a sequence of characters
               from the UCS. The resulting sequence of characters
               SHOULD be normalized using NFC.

Any comments welcome!

Regards,    Martin.


At 21:12 05/06/20, Bjoern Hoehrmann wrote:
>
>* Chris Lilley wrote:
>>I can see that this is potentially an issue in CSS, but for XML where the
>>only two encodings guaranteed to work across XML parsers are UTF-8 and
>>UTF-16, and where use of any other (non codepoint subset - declaring
>>UTF-8 and then using US-ASCII is not relevant here) encoding has
>>always required declaration of the encoding, this seems to be less of a
>>problem.
>
>CSS 2.1 defines an encoding detection algorithm that is at least as
>deterministic as the encoding detection algorithm of the referencing
>format, that's not considerably different from XML. XML and CSS im-
>plementations also implement more encodings in practise, so there is
>not really much difference here.
>
>The problem is interoperability, existing implementations do not ever
>normalize (in accordance with the relevant specifications) and it is
>not well-defined when normalization is required to occur. RFC 3987
>assumes a static processing model where only a single textual data
>object is involved.
>
>Processing in a dynamic environment where multiple such objects are
>involved (e.g., an external script modifying the DOM tree of some other
>document) processing is at best unclear. It is also not defined what a
>"non-Unicode encoding" is and which revision of UAX #15 is to be applied
>(the NFC form of a string may change with each Unicode update).
>
>XML 1.1 for example mentions UTF-8, UTF-16 and UTF-32 as Unicode
>encodings, XML C14N mentions UTF-8, UTF-16, UTF-16BE, UTF-16LE,
>UCS-2, and UCS-4, to mention just two possible definitions for the
>term.
>
>And even if that is well-defined, the requirement is non-trivial to
>implement in a sane manner due to both complexity issues as well as
>footprint issues, a NFC normalizer is not a tiny piece of software.
>
>So we have existing implementations and new implementations that are
>required to behave differently, revisions to Unicode that allow im-
>plementations to behave differently over time, and good reasons to
>ignore or misunderstand the requirement, which in practise means that
>the requirement cannot be relied upon, which renders the requirement
>obsolete.
>
>Good luck exiting CR with proper tests for this in the test suite...
>-- 
>Bj��n H��rmann キ mailto:bjoern@hoehrmann.de キ http://bjoern.hoehrmann.de
>Weinh. Str. 22 キ Telefon: +49(0)621/4309674 キ http://www.bjoernsworld.de
>68309 Mannheim キ PGP Pub. KeyID: 0xA4357E78 キ http://www.websitedev.de/ 


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp

Received on Monday, 2 July 2007 08:45:51 UTC