- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Mon, 20 Jun 2005 14:12:49 +0200
- To: Chris Lilley <chris@w3.org>
- Cc: www-svg@w3.org, public-i18n-core@w3.org, public-iri@w3.org
* Chris Lilley wrote: >I can see that this is potentially an issue in CSS, but for XML where the >only two encodings guaranteed to work across XML parsers are UTF-8 and >UTF-16, and where use of any other (non codepoint subset - declaring >UTF-8 and then using US-ASCII is not relevant here) encoding has >always required declaration of the encoding, this seems to be less of a >problem. CSS 2.1 defines an encoding detection algorithm that is at least as deterministic as the encoding detection algorithm of the referencing format, that's not considerably different from XML. XML and CSS im- plementations also implement more encodings in practise, so there is not really much difference here. The problem is interoperability, existing implementations do not ever normalize (in accordance with the relevant specifications) and it is not well-defined when normalization is required to occur. RFC 3987 assumes a static processing model where only a single textual data object is involved. Processing in a dynamic environment where multiple such objects are involved (e.g., an external script modifying the DOM tree of some other document) processing is at best unclear. It is also not defined what a "non-Unicode encoding" is and which revision of UAX #15 is to be applied (the NFC form of a string may change with each Unicode update). XML 1.1 for example mentions UTF-8, UTF-16 and UTF-32 as Unicode encodings, XML C14N mentions UTF-8, UTF-16, UTF-16BE, UTF-16LE, UCS-2, and UCS-4, to mention just two possible definitions for the term. And even if that is well-defined, the requirement is non-trivial to implement in a sane manner due to both complexity issues as well as footprint issues, a NFC normalizer is not a tiny piece of software. So we have existing implementations and new implementations that are required to behave differently, revisions to Unicode that allow im- plementations to behave differently over time, and good reasons to ignore or misunderstand the requirement, which in practise means that the requirement cannot be relied upon, which renders the requirement obsolete. Good luck exiting CR with proper tests for this in the test suite... -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Monday, 20 June 2005 12:13:00 UTC