W3C home > Mailing lists > Public > www-international@w3.org > April to June 2011

Re: Should the UTF-8 BOM trump overriding via HTTP or by users?

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Tue, 07 Jun 2011 16:56:29 +0200
To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Cc: www-international <www-international@w3.org>
Message-ID: <ftdsu65ajebgfdpbnrgv01c3dtiov4mf7i@hive.bjoern.hoehrmann.de>
* Leif Halvard Silli wrote:
>Bjoern Hoehrmann, Tue, 07 Jun 2011 06:39:34 +0200:
>> Higher-level information overrides lower-level information, explicit
>> information overrides fallbacks, and user agents should do what their
>> users want them to do. So, HTTP-level Content-Type overrides document-
>> internal information, a BOM overrides user-chosen fallbacks, and user-
>> chosen overrides trump anything else.
>
>You portray the BOM as  "fallback". It actuallly is an encoding 
>signature.

If you think I wrote something that is inconsistent with facts, then
maybe you misread what I wrote? I did not, and did not mean to, por-
tray a Unicode signature as a fallback in the sense I used the word.
I meant fallback in the sense of a "If page lacks encoding declaration
assume it's $encoding encoded" setting, as opposed to a "Whatever the
page says it's encoded in, use $encoding to decode" setting.

>"Looks like a BOM". Looks like or are exactly those bytes? Can you 
>describe a use case? When and how can an XML document/entity legally 
>start with the BOM if it is not meant to  be interpreted as the BOM?  

Looks like as opposed to "defined as".

  Content-Type: application/xml-external-parsed-entity;charset=l1

  0xFE 0xFF

That's a properly formed external parsed entity containing LATIN SMALL
LETTER THORN and LATIN SMALL LETTER Y WITH DIAERESIS. If you ignore the
charset parameter, the bytes may look like a Unicode signature, but the
bytes are not a Unicode signature because they are not defined as such.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
Received on Tuesday, 7 June 2011 14:56:55 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 7 June 2011 14:56:56 GMT