Re: UTF-8 signature / BOM in CSS


I am not sure I would agree with stripping non-characters. I would
rather reject documents with junk in them than silently clean them up.

In the case of the UTF-8 BOM, I would not object to simply stripping it,
but it does seem odd to not make use of the information about the
document's encoding and odder still to not use the information about
endian-ness in a UTF-16 encoded document. Also stripping it in the case
of UTF-16 would eliminate useful information from a CSS document.

To answer your question about other BOMs, they are all based on U+FEFF,
but they exist for
UTF-16, UTF-32, and SCSU (Unicode compression).

I have a list with more detail here:

and the Unicode Consortium has a FAQ on UTF-8 and the BOM at:


Etan Wexler wrote:
> Richard Ishida wrote to <>,
> <>, and <> on 2
> December 2003 in "RE: UTF-8 signature / BOM in CSS"
> (<mid:005301c3b8e4$1d862250$6501a8c0@w3c40upc3ma3j2>):
> > I wonder whether CSS can introduce a change to CSS2.1 at this stage to
> > clarify that the BOM - particularly any UTF-8 signature - should not be
> > considered part of the following text.
> I'd like to see such a revision made.
> CSS specifications should mandate a preparation phase for CSS
> consumption. In this phase, a CSS engine would strip an initial BOM, if
> present, and strip all noncharacters. After this phase, a clean stream
> of Unicode characters gets passed to the tokenizer; parsing proceeds as
> specified in the grammar.
> By the way, what UTF-8 signatures exist besides U+FEFF?
> --
> Etan Wexler.
> (Sorry about the character munging in my original message. And sorry
> about using my unsubscribed address, thus splitting the thread. I'm
> reconnecting with www-style.)

Tex Texin   cell: +1 781 789 1898
Xen Master                
Making e-Business Work Around the World

Received on Tuesday, 2 December 2003 23:28:33 UTC