CSS validator mystery symbol: ÿþ, UTF-16 BOM

Paul Coombe wrote to the W3C CSS-validator list 
<mailto:www-validator-css@w3.org> on 6 December 2004 in “css validator 
mystery symbol” (<mid:41B4E781.2060605@sympatico.ca>, 
<http://www.w3.org/mid/41B4E781.2060605@sympatico.ca>):

> When using the W3C CSS validator I got an error that showed a y with 2 
> dots above and a combination pb. This is the closest I could come to a 
> reproduction.

Then came a graphic depicting glyphs for the character sequence “ÿþ” 
(<Latin small letter y with diaeresis, Latin small letter thorn>, 
<U+00FF U+00FE>).

> What does it mean?

It was almost certainly supposed to be an encoding signature, flagging 
the encoding of the text as little-endian UTF-16.

The encoding of the style sheet represented each character with an 
eight-bit byte, giving us <FF FE>. The character zero width no-break 
space (U+FEFF) has the semantics of a byte-order mark (known as “BOM”), 
or encoding signature. When serialized as little-endian UTF-16, the BOM 
yields the bytes <FF FE>.

The solution that first comes to mind is to use better authoring 
software. A good text editor will let the author choose the encoding in 
which to save and, in the case of the UTF encodings, whether to ensure 
the presence of a BOM at the start of text.

-- 
Etan Wexler.

Received on Thursday, 7 July 2005 04:33:18 UTC