W3C home > Mailing lists > Public > public-i18n-core@w3.org > July to September 2011

[Bug 13396] i18n-ISSUE-77: HTTP and defaulting to UTF-16LE

From: <bugzilla@jessica.w3.org>
Date: Thu, 28 Jul 2011 08:03:43 +0000
To: public-i18n-core@w3.org
Message-Id: <E1QmLZH-0003wq-L2@jessica.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=13396

Henri Sivonen <hsivonen@iki.fi> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hsivonen@iki.fi

--- Comment #1 from Henri Sivonen <hsivonen@iki.fi> 2011-07-28 08:03:42 UTC ---
(In reply to comment #0)
> 8.2.2.2 Character encodings
> http://www.w3.org/TR/html5/parsing.html#character-encodings-0
> 
> Supported by the i18n WG.
> 
> "When a user agent is to use the UTF-16 encoding but no BOM has been found,
> user agents must default to UTF-16LE."
> 
> If the HTTP header declares the file to be UTF-16BE, which I believe it can,
> and in which case a BOM should *not* be used, then I think that this would not
> be true.

Then the user agent isn't to use the UTF-16 encoding but the UTF-16BE encoding.
The quoted sentence shouldn't say "UTF-16LE". It should say "little-endian
UTF-16", unless the spec intends the reported encoding for the document to
change and I'm pretty sure that's not the intention.

> If the HTTP header declares the file to be UTF-16, then there must be
> a BOM, so I assume that this is a recovery mechanism if someone does declare
> UTF-16 in HTTP but omits the BOM.

Yes.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You reported the bug.
Received on Thursday, 28 July 2011 08:03:50 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 28 July 2011 08:03:52 GMT