Re: @charset rule from Tex Texin on 2004-07-19 (www-style@w3.org from July 2004)

From: Tex Texin <tex@xencraft.com>
Date: Mon, 19 Jul 2004 16:04:16 -0400
To: Mark Moore <mark.moore@notlimited.com>
Cc: www-style@w3.org, W3c I18n Group <w3c-i18n-ig@w3.org>
Message-ID: <40FC2940.3D83171A@xencraft.com>
Well, if it was for the list I might have chosen my words more carefully and
perhaps should have even for the private mail.

It is sometimes hard to know when to be thorough and when to just stick to the
context. I opted to provide  more info so you weren't surprised later. I listed
some other forms of UTF.

Neither CESU or utf8-ebcdic will appear in CSS. They are not used for
interchange.

If you want to worry about SCSU, we should also consider other compression
formats and worry about zipped or tar'd css files.
SCSU borders the line between a character encoding and a file encoding.

There are several encodings that are not unicode-based and are not ascii- or
ebcdic-based which exist.
However, as they do not contain the english letters needed for css keywords
they won't be used with css.

The reluctance if any to tighten things up has to do with understanding what is
being ruled out. 
There may be some encoding, useful for some languages, which would be
unnecessarily ruled out. If CSS wants to support native encodings, it shouldn't
be arbitrarily restrictive because most of us are ignorant of these things. On
the other hand, I wouldn't object if CSS simply required Unicode for CSS and
eliminated the ambiguity of encoding declaration or detection.

But that's not in the cards. So, the rules are a little ugly, but really are
not much of a burden.

Anyway, I am a bit under the weather so I cc'd the i18n group in case someone
else wants to jump in here or in case I am not writing clearly.

tex


Mark Moore wrote:
> > ...I'm not terribly familiar with Unicode beyond UTF-8 and UTF-16. 
> > Are there any significant encodings that mess with the lower code points?
> >
> > not really. You know utf-8, 16, 32.
> > Just for your info:
> >
> > There is a variation of utf-8 called CESU.
> > It turns out utf-8 orders surrogate characters differently from utf-16.
> > CESU is
> > utf-8 but preserves the order of surrogates.
> > http://www.unicode.org/unicode/reports/tr26/
> >
> > There is also a utf-8-ebcdic, but it is not for use "on the wire" and just
> > internal to ebcdic systems.
> > http://www.unicode.org/unicode/reports/tr16/
> 
> Never being on the wire doesn't protect CSS implementations, assuming there
> is ever a use for "native" CSS implementations on EBCDIC systems.  Right?
> 
> > Finally there is scsu- which is a compressed form of unicode and has its
> > own
> > bom identifier.
> > http://www.unicode.org/unicode/reports/tr6/
> >
> > But these are not going to crop up for xml or css.
> > So it's basicly the UTF's, ascii, and ebcdic.
> > tex
> 
> I sure wish the owners of the CSS spec would just come out and say this.  If
> this is the case (which I believe), they should just say so.  I don't
> understand the reluctance to tighten things up.
> 
> -MM
> 
> PS. I CC'd www-style since your info may be helpful to others.  Hoppe you
> don't mind...

-- 
-------------------------------------------------------------
Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
Xen Master                          http://www.i18nGuy.com
                         
XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World
-------------------------------------------------------------
Received on Monday, 19 July 2004 16:05:10 UTC