- From: Anne van Kesteren <annevk@opera.com>
- Date: Wed, 28 Dec 2011 16:13:56 +0100
On Wed, 28 Dec 2011 12:30:49 +0100, Leif Halvard Silli <xn--mlform-iua at m?lform.no> wrote: > I spotted a shortcoming in your testing: > >> I ran some utf-16 tests using 007A as input data, optionally preceded by >> FFFE or FEFF, and with utf-16, utf-16le, and utf-16be declared in the >> Content-Type header. For WebKit I tested both Safari 5.1.2 and Chrome >> 17.0.963.12. Trident is Internet Explorer 9 on Windows 7. Presto is >> Opera >> 11.60. Gecko is Nightly 12.0a1 (2011-12-26). >> >> HTTP BOM Trident WebKit Gecko Presto >> utf-16 - 7A00 7A00 007A 007A >> utf-16le - 7A00 7A00 7A00 7A00 >> utf-16be - 007A 007A 007A 007A > > The above test row is not complete. You should also run a BOM-less test > using the UTF-16 label but where the 007A is represented in the > big-endian way - a bit like I did here: > <http://malform.no/testing/utf/#html-table-7>. The you get as result > that Opera and Firefox do not take it for a given that files sent as > 'utf-16' are big-endian: > > utf-16 - gibb* gibb* 007A 007A > > *gibb = gibberish/mojibake. I get U+7A00 as I indicated above. I would not qualify that as gibberish personally. (My table is somewhat confusing as input 007A was meant to describe octets, but the table describes code points.) Anyway, per http://lists.whatwg.org/pipermail/whatwg-whatwg.org/2009-July/021102.html Presto and Gecko do have some magic, but it seems better if they were the same as Trident (and WebKit). > That the BOM is removed from the output for utf-16be labelled files, > means that the 'utf-16be' labelled file nevertheless is treated as > UTF-16 (per UTF-16's specification). (Otherwise, if it had not been > removed, the BOM character should have caused quirks mode.) > > Taking what you did not test for into account, it would make sense if > 'utf-16' continues to be treated as a label under which both big-endian > and litt-endian can be expected. And thus, that Webkit and IE starts to > detect when UTF-16 is big-endian, but without a BOM. I am not sure what you are trying to say here. -- Anne van Kesteren http://annevankesteren.nl/
Received on Wednesday, 28 December 2011 07:13:56 UTC