[whatwg] UTF-16 encoding default

There's a page (http://www.microsoft.com/windowsmobile/mobile/en-us/totalaccess/software/software/eula-sw-netflix.mspx specifically) that has a Content-Type header of "text/html; charset=utf-16" and has no BOM. The references I've seen (RFC2781, as well as http://unicode.org/faq/utf_bom.html#gen7) say that this means the content should be assumed to be UTF-16BE. The page, however, is actually in UTF-16LE.

All browsers seem to do some sort of unspecified magic and figure out that the page is in LE. I was wondering if that magic could be described and added to the HTML5 spec so that it covers rendering the above page as expected. According to the draft spec as it stands, I believe that page should be rendered as garbage.

Cheers,
kats

PS - the page also has a meta tag that says the charset is iso-8859-1. *sigh*

Received on Tuesday, 23 June 2009 18:42:29 UTC