- From: Tex Texin <tex@xencraft.com>
- Date: Mon, 19 Jul 2004 16:04:16 -0400
- To: Mark Moore <mark.moore@notlimited.com>
- Cc: www-style@w3.org, W3c I18n Group <w3c-i18n-ig@w3.org>
Well, if it was for the list I might have chosen my words more carefully and perhaps should have even for the private mail. It is sometimes hard to know when to be thorough and when to just stick to the context. I opted to provide more info so you weren't surprised later. I listed some other forms of UTF. Neither CESU or utf8-ebcdic will appear in CSS. They are not used for interchange. If you want to worry about SCSU, we should also consider other compression formats and worry about zipped or tar'd css files. SCSU borders the line between a character encoding and a file encoding. There are several encodings that are not unicode-based and are not ascii- or ebcdic-based which exist. However, as they do not contain the english letters needed for css keywords they won't be used with css. The reluctance if any to tighten things up has to do with understanding what is being ruled out. There may be some encoding, useful for some languages, which would be unnecessarily ruled out. If CSS wants to support native encodings, it shouldn't be arbitrarily restrictive because most of us are ignorant of these things. On the other hand, I wouldn't object if CSS simply required Unicode for CSS and eliminated the ambiguity of encoding declaration or detection. But that's not in the cards. So, the rules are a little ugly, but really are not much of a burden. Anyway, I am a bit under the weather so I cc'd the i18n group in case someone else wants to jump in here or in case I am not writing clearly. tex Mark Moore wrote: > > ...I'm not terribly familiar with Unicode beyond UTF-8 and UTF-16. > > Are there any significant encodings that mess with the lower code points? > > > > not really. You know utf-8, 16, 32. > > Just for your info: > > > > There is a variation of utf-8 called CESU. > > It turns out utf-8 orders surrogate characters differently from utf-16. > > CESU is > > utf-8 but preserves the order of surrogates. > > http://www.unicode.org/unicode/reports/tr26/ > > > > There is also a utf-8-ebcdic, but it is not for use "on the wire" and just > > internal to ebcdic systems. > > http://www.unicode.org/unicode/reports/tr16/ > > Never being on the wire doesn't protect CSS implementations, assuming there > is ever a use for "native" CSS implementations on EBCDIC systems. Right? > > > Finally there is scsu- which is a compressed form of unicode and has its > > own > > bom identifier. > > http://www.unicode.org/unicode/reports/tr6/ > > > > But these are not going to crop up for xml or css. > > So it's basicly the UTF's, ascii, and ebcdic. > > tex > > I sure wish the owners of the CSS spec would just come out and say this. If > this is the case (which I believe), they should just say so. I don't > understand the reluctance to tighten things up. > > -MM > > PS. I CC'd www-style since your info may be helpful to others. Hoppe you > don't mind... -- ------------------------------------------------------------- Tex Texin cell: +1 781 789 1898 mailto:Tex@XenCraft.com Xen Master http://www.i18nGuy.com XenCraft http://www.XenCraft.com Making e-Business Work Around the World -------------------------------------------------------------
Received on Monday, 19 July 2004 16:05:10 UTC