Re: UTF-8 signature / BOM in CSS from Mark Davis on 2003-12-08 (www-style@w3.org from December 2003)

From: Mark Davis <mark.davis@jtcsv.com>
Date: Sun, 7 Dec 2003 23:54:00 -0500 (EST)
To: <ernestcline@mindspring.com>, François Yergeau <francois@yergeau.com>
Cc: <www-style@w3.org>, <w3c-i18n-ig@w3.org>, "Etan Wexler" <ewexler@stickdog.com>
Message-ID: <001c01c3bd47$496131a0$7900a8c0@DAVIS1>

This conversation has meandered around a while, so I'm not sure what the
questions at hand are.

For historical reasons, U+FEFF had two functions; one as a BOM (only when
initial), and the other with a line-joining function (preventing linebreak). The
latter function is now taken by another character, and over time we want
software to transition towards that other character so that the U+FEFF only has
a single function, and that any non-initial usage can be recognized as the
result of a some failure somewhere to remove an initial BOM. Even though the use
of U+FEFF is quite limited in practice, the transition will take some time.

However, the main point is with either function, U+FEFF is unsuited for being in
identifiers, as are all of the Cf characters. The paths I see are
(a) filter them all out, or
(b) treat them as illegal characters, and reject the entity
(statement/block/file*) that they are in, or
(c) treat them as 'whitespace' in parsing (e.g. not being part of identifiers,
but separating them: abc&#xFEFF;def would be treated as the sequence of 2
identifiers abc and def).

Mark
__________________________________
http://www.macchiato.com
► शिष्यादिच्छेत्पराजयम् ◄

----- Original Message ----- 
From: "Ernest Cline" <ernestcline@mindspring.com>
To: "François Yergeau" <francois@yergeau.com>
Cc: <www-style@w3.org>; <w3c-i18n-ig@w3.org>; "Etan Wexler"
<ewexler@stickdog.com>
Sent: Sun, 2003 Dec 07 18:44
Subject: Re: UTF-8 signature / BOM in CSS


>
>
>
>
> > [Original Message]
> > From: François Yergeau <francois@yergeau.com>
> >
> > Ernest Cline a écrit  :
> > > Making stuff that was acceptable earlier
> > > unacceptable should only be done when there is a compelling
> > > reason to do so.  Other than a theological debate over whether it is
> > > a character, I see no reason to do so, and that reason is not compelling
> > > to me.
> >
> > Nor to me.  But a much stronger reason for wanting U+FEFF excluded from
> > identifiers is that it is now deprecated in Unicode, because of the
> > ambiguity of its role as a BOM or a ZWNBSP.  Unicode has introduced
> > U+2060 to play the latter role and recommends to use it exclusively.
> > That's about as much a Good Idea as equating the BOM and ZWNBSP
> > was a Bad Idea, and it would be nice if CSS could take heed.
>
> I can't say that I agree with that reasoning either.  What about the ten
> fully
> deprecated characters?  What happens when more Unicode characters
> are deprecated?  And finally consider this quote from definition D7a of
> the Unicode standard (Section 3.4 Characters and Encoding)
>
> Deprecated characters are _retained_ (emphasis mine) in the standard
> so that previously conforming data stay conformant in future versions
> of the standard,
>
> Given the total lack of any ability to indicate which version of CSS
> was used, treating U+FEFF or the ten fully deprecated characters
> differently in CSS seems to me to be a bad idea as it would make
> previously conforming CSS no longer conforming, which is clearly
> not the intent in deprecating a Unicode character.
>
>
>
>

Received on Monday, 8 December 2003 05:14:56 UTC