- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Wed, 31 Mar 2010 14:12:00 +0300
Currently, the spec says that document.open() sets the document's character encoding to UTF-16. This is what IE does except IE uses the label "unicode" instead of "UTF-16". Demo: http://software.hixie.ch/utilities/js/live-dom-viewer/saved/438 Gecko and WebKit set document's character encoding to UTF-8 even though the parser operates on UTF-16. Demo: http://software.hixie.ch/utilities/js/live-dom-viewer/saved/439 When loading external resources that don't have encoding labels, IE, Gecko and WebKit all use UTF-8 to decode the external resource. Demo: http://software.hixie.ch/utilities/js/live-dom-viewer/saved/437 Opera doesn't support document.charset or document.characterSet, but demo 37 and the demos discussed below show that Opera applies the default encoding (Windows-1252) to external resources referenced from document.open()ed documents. Spec change request: Please change the spec to say that document.open() sets the document's character encoding to UTF-8 even though the parser operates on UTF-16 DOMStrings. My real interest in this topic isn't so much about the initial character encoding value but about the effect of <meta charset> on document.open()ed documents. Consider this demo in Gecko with the old HTML parser: http://software.hixie.ch/utilities/js/live-dom-viewer/saved/434 The demo alerts two times: first showing the REPLACEMENT CHARACTER and then showing LATIN SMALL LETTER R WITH ACUTE. First, Gecko parses the document with UTF-8 as the document's character encoding. During that parse, the value ISO-8859-2 from the meta is added to the cache entry for this stream (see my earlier email about reloading document.open()ed documents). Then the document is implicitly reloaded with ISO-8859-2 as the document's character encoding. This was implemented in https://bugzilla.mozilla.org/show_bug.cgi?id=255820 back when Gecko used UTF-16 instead of UTF-8 as the document's character encoding for document.open()ed docs and using UTF-16 for external resources made the external resources fail to parse. Curiously, the implicit reloading behavior isn't particularly robust. In some situations the reload doesn't happen. I don't know what the logic is. Demo with the order of meta and script swapped: http://software.hixie.ch/utilities/js/live-dom-viewer/saved/435 None of IE, WebKit or Opera let the meta charset in a document.open()ed document have any effect, which seems to suggests that Gecko might be trying unnecessarily hard in this scenario. Due to the test case for https://bugzilla.mozilla.org/show_bug.cgi?id=255820 I made the meta charset change the document's character encoding (but not reload) when the HTML5 parser is enabled in Gecko. See demos 435 and 434 with html5.enable=true. However, now it seems it might be better to revert that change to align with IE and WebKit--unless sites now depend on the Gecko behavior. Do other browser vendors have data showing sites depending on Gecko's behavior when loading external resources for document.open()ed docs? -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/
Received on Wednesday, 31 March 2010 04:12:00 UTC