W3C home > Mailing lists > Public > public-html@w3.org > May 2007

Leading BOM

From: Henri Sivonen <hsivonen@iki.fi>
Date: Sat, 26 May 2007 00:32:52 +0300
Message-Id: <852E9F41-65DD-4F4E-B31B-891FC976E759@iki.fi>
To: HTML WG <public-html@w3.org>

The draft says:
"A leading U+FEFF BYTE ORDER MARK (BOM) must be dropped if present."

That's reasonable for UTF-8 when the encoding has been established by  
other means.

However, when the encoding is UTF-16LE or UTF-16BE (i.e. supposed to  
be signatureless), do we really want to drop the BOM silently?  
Shouldn't it count as a character that is in error?

Likewise, if an encoding signature BOM has been discarded and the  
first logical character of the stream is another BOM, shouldn't that  
also count as a character that is in error?

Henri Sivonen
Received on Friday, 25 May 2007 21:33:09 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:21 UTC