- From: Anne van Kesteren <annevk@opera.com>
- Date: Tue, 27 Dec 2011 15:52:01 +0100
I ran some utf-16 tests using 007A as input data, optionally preceded by FFFE or FEFF, and with utf-16, utf-16le, and utf-16be declared in the Content-Type header. For WebKit I tested both Safari 5.1.2 and Chrome 17.0.963.12. Trident is Internet Explorer 9 on Windows 7. Presto is Opera 11.60. Gecko is Nightly 12.0a1 (2011-12-26). HTTP BOM Trident WebKit Gecko Presto utf-16 - 7A00 7A00 007A 007A utf-16le - 7A00 7A00 7A00 7A00 utf-16be - 007A 007A 007A 007A utf-16 FFFE 7A00 7A00 7A00 7A00 utf-16le FFFE 7A00 7A00 7A00 7A00 utf-16be FFFE 7A00 7A00 FFFD* FFFD* utf-16 FEFF 007A 007A 007A 007A utf-16le FEFF 007A 007A FFFD** FFFD** utf-16be FEFF 007A 007A 007A 007A * Gecko decodes FFFE 007A as FFFD followed by FE00 presumably dropping the 7A. Opera decodes it as FFFD 007A. ** Gecko decoes FEFF 007A as FFFD followed by 00FF presumably dropping the 7A. Opera decodes it as FFFD 7A00. It seems in Trident/WebKit utf-16 and utf-16le are labels for the same encoding and the BOM is more important than the encoding. Gecko and Presto match existing specifications around utf-16 with different error handling (afaict). I think http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html should follow Trident/WebKit. Specifically: utf-16 defaults to utf-16le in absence of a BOM. utf-16le becomes a label for utf-16. A BOM overrides the direction (of utf-16 / utf-16be) and is removed from the output. -- Anne van Kesteren http://annevankesteren.nl/
Received on Tuesday, 27 December 2011 06:52:01 UTC