Re: SVG12: IRI Processing rules and xlink:href from Bjoern Hoehrmann on 2005-06-20 (public-i18n-core@w3.org from April to June 2005)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Mon, 20 Jun 2005 14:12:49 +0200
To: Chris Lilley <chris@w3.org>
Cc: www-svg@w3.org, public-i18n-core@w3.org, public-iri@w3.org
Message-ID: <42bba31d.13047828@smtp.bjoern.hoehrmann.de>

* Chris Lilley wrote:
>I can see that this is potentially an issue in CSS, but for XML where the
>only two encodings guaranteed to work across XML parsers are UTF-8 and
>UTF-16, and where use of any other (non codepoint subset - declaring
>UTF-8 and then using US-ASCII is not relevant here) encoding has
>always required declaration of the encoding, this seems to be less of a
>problem.

CSS 2.1 defines an encoding detection algorithm that is at least as
deterministic as the encoding detection algorithm of the referencing
format, that's not considerably different from XML. XML and CSS im-
plementations also implement more encodings in practise, so there is
not really much difference here.

The problem is interoperability, existing implementations do not ever
normalize (in accordance with the relevant specifications) and it is
not well-defined when normalization is required to occur. RFC 3987
assumes a static processing model where only a single textual data
object is involved.

Processing in a dynamic environment where multiple such objects are
involved (e.g., an external script modifying the DOM tree of some other
document) processing is at best unclear. It is also not defined what a
"non-Unicode encoding" is and which revision of UAX #15 is to be applied
(the NFC form of a string may change with each Unicode update).

XML 1.1 for example mentions UTF-8, UTF-16 and UTF-32 as Unicode
encodings, XML C14N mentions UTF-8, UTF-16, UTF-16BE, UTF-16LE,
UCS-2, and UCS-4, to mention just two possible definitions for the
term.

And even if that is well-defined, the requirement is non-trivial to
implement in a sane manner due to both complexity issues as well as
footprint issues, a NFC normalizer is not a tiny piece of software.

So we have existing implementations and new implementations that are
required to behave differently, revisions to Unicode that allow im-
plementations to behave differently over time, and good reasons to
ignore or misunderstand the requirement, which in practise means that
the requirement cannot be relied upon, which renders the requirement
obsolete.

Good luck exiting CR with proper tests for this in the test suite...
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

Received on Monday, 20 June 2005 12:13:00 UTC