- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Sat, 07 May 2005 16:33:03 +0200
- To: Martin Duerst <duerst@it.aoyama.ac.jp>
- Cc: www-style@w3.org, public-i18n-core@w3.org
* Martin Duerst wrote: > >For an example of the difference, see: > > > > http://lists.w3.org/Archives/Public/www-style/2005Mar/0102 > >This example is well worked out, but quite theoretical. That's a highly theoretical remark as implementations have to implement this regardless of how theoretical the example might be. It's in fact far more likely that implementations encounter cases like this one than style sheets in windows-1258 that depend on implementations doing NFC normalization. >The original character encoding is part of the Infoset. See point 6 at >http://www.w3.org/TR/2004/REC-xml-infoset-20040204/#infoitem.document. >Given this, I don't think that claims like "nigh-on-impossible" are >justified. Getting at that info may be a bit of a hack in some >implementations, but "nigh-on-impossible" it is not. There is no finite deterministic algorithm that maps each possible input to a well-defined consistent result. It is thus by definition impossible to interoperably implement the requirement. Such an algorithm would need to define * what is considered a "non-Unicode character encoding" * when is a IRI considered to be in such an encoding * what happens if the IRI comes from a distinct textual data object * which version of NFC is to be used * etc. none of which RFC 3987 clearly defines. Much worse, it assumes that it is possible to get from the "IRI" to information about the digial en- coding of it, most deployed architecture however assumes that this is not relevant. Note that Ian said it is nigh-on impossible to implement this *sanely*, not to implement it at all. I strongly agree with that and have yet to see evidence of the contrary. It is in my opinion highly unrealistic to expect implementers to add this complexity to their im- plementations (possibly breaking existing content) just to cater for a few users who create content that * uses some legacy encoding against advice * is not NFC normalized against advice * breaks in many implementations * breaks in many usage scenarios * etc. Anyway, maybe you could explain how adopting IRIs in CSS would not make the specification non-conforming to http://www.w3.org/TR/charmod/#C014? My understanding of item 3 ("this MUST be equivalent to transcoding the data object to some Unicode encoding form, adjusting any character encoding label if necessary, and receiving it in that Unicode encoding form") is that processing of @charset "iso-8859-1"; element { background-image: url(Bjo\000308rn) } must be defined such that it is equivalent to processing @charset "utf-8"; element { background-image: url(Bjo\000308rn) } while my understanding of RFC 3987 is that it must not be equivalent. >But such feedback should not be used to throw >out the baby with the bathwater. Making it clear that CSS interprets >IRIs using UTF-8, rather than the encoding of the document (such >implementations still exist, although you claim that getting at >that encoding is "nigh-on-impossible") is a very high priority. I do not know of any CSS implementation that does this for e.g. @charset "us-ascii"; element { background-image: url(Björn) } as that is obviously impossible. So these implementations do some- thing different from what you think they do. When I tested this 2 years ago, my results were * Internet Explorer 6.0 SP 1 for Windows -> fails tests containing \F6 in path/query or björn in query: url(bj\F6rn) => bj/F6rn url(björn?bj\F6rn) => bj%C3%B6rn?bj\F6rn url(björn?björn) => bj%C3%B6rn?bj<F6>rn (<F6> is byte 0xF6) * Opera 7.11 for Windows -> fails tests containing unescaped/unquoted 'ö's: url(björn) => ignored url(bj\F6rn?björn) => ignored * Amaya 7.0 for Windows -> fails all tests: url(björn) => bj%f6rn url("björn") => bj%f6rn url('björn') => 'bj%f6rn' url('björn#björn') => 'bj%f6rn * Mozilla 1.3a for Windows -> passes all tests (IIRC MacIE failed all tests as well but for different reasons and Safari passed all or most tests, where "all tests" can be derived from the cited cases just in all possible variations; this did not include NFD/NFC tests but more recent tests indicate all of them would fail). What do the implementations you mention do if the style sheet is changed through scripting like var ss = document.styleSheets.item(0); var ln = ss.cssRules.length; ss.insertRule("#test1 { background-image: url(Björn) }", ln); ss.insertRule("#test2 { background-image: url(Bjo\308rn) }", ln); from internal/external scripts and style sheets in different encodings? I do not know what RFC 3987 might require in this case, but it seems unlikely they would behave as you describe. But as you apparently wrote tests for this, maybe you could contribute them to the CSS Working Group? Proper IRI testing would require thousands of tests, (NFC, IDN, error handling, cross-technology tests involving at least the CSS DOM and fragment identifier unescaping e.g. when using some SVG fragment as background-image, dealing with base IRIs, many character encodings, character encoding scheme detection e.g. when the encoding scheme is determined by a charset attribute on an <a> element three documents ago, etc.) it'd sure help a lot to have a complete test suite to both implement this in browsers and properly specify it. Proposals for text to include in CSS 3 would help too, we need to specify e.g. what happens if a string is not a proper IRI, is it considered an illegal value and ignored per CSS or should it be defined as in SVG where implementations are not required to check for malformed IRIs but rather implement random behavior instead, or should we define generic error recovery requirements? -- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Saturday, 7 May 2005 14:32:35 UTC