W3C home > Mailing lists > Public > www-international@w3.org > October to December 2012

Re: byte order mark article

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Thu, 22 Nov 2012 03:03:48 +0100
To: John Cowan <cowan@mercury.ccil.org>
Cc: Anne van Kesteren <annevk@annevk.nl>, www-international@w3.org
Message-ID: <20121122030348392503.870d84aa@xn--mlform-iua.no>
John Cowan, Wed, 21 Nov 2012 20:45:15 -0500:
> Leif Halvard Silli scripsit:

>> Second: When there is an an external declaration which says "UTF-16",
>>         then the requirement to include a BOM is relaxed. The parser
>>         could e.g. default to UTF-16LE, as Unicode says.
> 
> It does not default to the UTF-16LE encoding, but to the UTF-16 encoding
> with little-endian interpretation.

  (Except that it, per Unicode, defaults to big endian, sorry.)
  
> These are two different things, though
> often confused.

Well, yes. And no. Isn't the BOM part of the UTF-16 encoding? If yes, 
then in a way it is more correct to say that it defaults to UTF-16BE. 

The free Mac text editor TextWrangler presents the option like so, in 
its encoding menu:
 
  .............................
[ UTF-8                         ]
[ UTF-16                        ]
  .............................
[ UTF-8, with BOM               ]
[ UTF-16, no BOM                ]
[ UTF-16 Little Endian          ]
[ UTF-16 Little Endian, no BOM  ]
  .............................

-- 
Leif H Silli
Received on Thursday, 22 November 2012 02:04:19 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 22 November 2012 02:04:20 GMT