- From: And Clover <and-py@doxdesk.com>
- Date: Thu, 01 Apr 2010 05:26:32 +0200
Henri Sivonen wrote: > Spec change request: Please change the spec to say that document.open() > sets the document's character encoding to UTF-8 +1. UTF-16 is a troublesome encoding for [X]HTML[5] documents and should be consistently discouraged; as a ASCII-non-superset it interacts very poorly with byte interfaces in HTTP and form submissions. No browser will actually try to submit a form as UTF-16 for this reason, but it still causes problems. eg. Firefox misleadingly sets the `_charset_` hack field to 'UTF-16' even though the submission is UTF-8-encoded. > even though the parser operates on UTF-16 DOMStrings. The term 'UTF-16' can mean two very different things: either a sequence of 16-bit code units (as in DOMString), or a sequence of bytes which, taken as UTF-16LE or UTF-16BE, represent 16-code units. Unicode's tradition of conflating the meanings of the code unit sequence and the byte sequence has caused much confusion. DOM Level 3 LS made the mistake of saying that because DOMStrings are UTF-16-code-units, XML documents parsed from `LSInput.characterStream`/`StringData` should receive the `encoding` 'UTF-16', as if the parser has done a conversion from UTF-16-bytes to characters, though no such process has actually taken place. Consequently when you serialise a document parsed from a string in DOM Level 3 LS you get an unexpected and unwanted UTF-16 document. -- And Clover mailto:and at doxdesk.com http://www.doxdesk.com/
Received on Wednesday, 31 March 2010 20:26:32 UTC