Re: SVG12: IRI Processing rules and xlink:href

I have assigned this issue the following id: transcodeNFC-103
(http://www.w3.org/International/iri-edit/#transcodeNFC-103).

I have changed the wording in step 1.b. of Section
3.1.  Mapping of IRIs to URIs from:

            b. If the IRI is in some digital representation (e.g., an
               octet stream) in some known non-Unicode character
               encoding, convert the IRI to a sequence of characters
               from the UCS normalized according to NFC.

to:

            b. If the IRI is in some digital representation (e.g., an
               octet stream) in some known non-Unicode character
               encoding, convert the IRI to a sequence of characters
               from the UCS. The resulting sequence of characters
               SHOULD be normalized using NFC.

Any comments welcome!

Regards,    Martin.


At 21:12 05/06/20, Bjoern Hoehrmann wrote:
>
>* Chris Lilley wrote:
>>I can see that this is potentially an issue in CSS, but for XML where the
>>only two encodings guaranteed to work across XML parsers are UTF-8 and
>>UTF-16, and where use of any other (non codepoint subset - declaring
>>UTF-8 and then using US-ASCII is not relevant here) encoding has
>>always required declaration of the encoding, this seems to be less of a
>>problem.
>
>CSS 2.1 defines an encoding detection algorithm that is at least as
>deterministic as the encoding detection algorithm of the referencing
>format, that's not considerably different from XML. XML and CSS im-
>plementations also implement more encodings in practise, so there is
>not really much difference here.
>
>The problem is interoperability, existing implementations do not ever
>normalize (in accordance with the relevant specifications) and it is
>not well-defined when normalization is required to occur. RFC 3987
>assumes a static processing model where only a single textual data
>object is involved.
>
>Processing in a dynamic environment where multiple such objects are
>involved (e.g., an external script modifying the DOM tree of some other
>document) processing is at best unclear. It is also not defined what a
>"non-Unicode encoding" is and which revision of UAX #15 is to be applied
>(the NFC form of a string may change with each Unicode update).
>
>XML 1.1 for example mentions UTF-8, UTF-16 and UTF-32 as Unicode
>encodings, XML C14N mentions UTF-8, UTF-16, UTF-16BE, UTF-16LE,
>UCS-2, and UCS-4, to mention just two possible definitions for the
>term.
>
>And even if that is well-defined, the requirement is non-trivial to
>implement in a sane manner due to both complexity issues as well as
>footprint issues, a NFC normalizer is not a tiny piece of software.
>
>So we have existing implementations and new implementations that are
>required to behave differently, revisions to Unicode that allow im-
>plementations to behave differently over time, and good reasons to
>ignore or misunderstand the requirement, which in practise means that
>the requirement cannot be relied upon, which renders the requirement
>obsolete.
>
>Good luck exiting CR with proper tests for this in the test suite...
>-- 
>Bj$BS(Bn H$BI(Brmann $B%-(B mailto:bjoern@hoehrmann.de $B%-(B http://bjoern.hoehrmann.de
>Weinh. Str. 22 $B%-(B Telefon: +49(0)621/4309674 $B%-(B http://www.bjoernsworld.de
>68309 Mannheim $B%-(B PGP Pub. KeyID: 0xA4357E78 $B%-(B http://www.websitedev.de/ 


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     

Received on Monday, 2 July 2007 08:45:51 UTC