Re: Should the UTF-8 BOM trump overriding via HTTP or by users?

* Leif Halvard Silli wrote:
>Bjoern Hoehrmann, Tue, 07 Jun 2011 06:39:34 +0200:
>> Higher-level information overrides lower-level information, explicit
>> information overrides fallbacks, and user agents should do what their
>> users want them to do. So, HTTP-level Content-Type overrides document-
>> internal information, a BOM overrides user-chosen fallbacks, and user-
>> chosen overrides trump anything else.
>
>You portray the BOM as  "fallback". It actuallly is an encoding 
>signature.

If you think I wrote something that is inconsistent with facts, then
maybe you misread what I wrote? I did not, and did not mean to, por-
tray a Unicode signature as a fallback in the sense I used the word.
I meant fallback in the sense of a "If page lacks encoding declaration
assume it's $encoding encoded" setting, as opposed to a "Whatever the
page says it's encoded in, use $encoding to decode" setting.

>"Looks like a BOM". Looks like or are exactly those bytes? Can you 
>describe a use case? When and how can an XML document/entity legally 
>start with the BOM if it is not meant to  be interpreted as the BOM?  

Looks like as opposed to "defined as".

  Content-Type: application/xml-external-parsed-entity;charset=l1

  0xFE 0xFF

That's a properly formed external parsed entity containing LATIN SMALL
LETTER THORN and LATIN SMALL LETTER Y WITH DIAERESIS. If you ignore the
charset parameter, the bytes may look like a Unicode signature, but the
bytes are not a Unicode signature because they are not defined as such.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 

Received on Tuesday, 7 June 2011 14:56:55 UTC