- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Wed, 16 Nov 2011 12:40:08 +0200
- To: public-webapps WG <public-webapps@w3.org>
I landed support for HTML parsing in XHR in Gecko today. It has not yet propagated to the Nightly channel. Here's how it behaves: * Contrary to the spec, for response types other than "" and "document", character encoding determination for text/html happens the same way as for unknown types. * For text/html responses for response type "" and "document", the character encoding is established by taking the first match from this list in this order: - HTTP charset parameter - BOM - HTML-compliant <meta> prescan up to 1024 bytes. - UTF-8 * In particular, the following have no effect on the character encoding: - <meta> discovered by the tree builder algorithm - The user-configurable fallback encoding - Locale-specific defaults - The encoding of the document that invoked XHR - Byte patterns in the response (beyond BOM and <meta>). Even the BOMless UTF-16 detection that Firefox does when heuristic detection has otherwise been turned off is skipped for XHR. * When there is no HTTP-level charset parameter, progress events are stalled and responseText made null until the parser has found a BOM or a charset <meta> or has seen 1024 bytes or the EOF without finding either BOM or charset <meta>. * If the response is a multipart response, XHR behaves as if it didn't support HTML parsing for the subparts of the response. (The multipart handling infrastructure in Gecko makes assumptions that are incorrect for the off-the-main-thread parsing infrastructure. Since the plan is to move XML parsing off the main thread, too, we'll need to find out whether multipart support is a worthwhile feature to keep. If it is, we need to add some mechanisms to make multipart work when subparts are parsed off the main thread or. If not, we should drop the feature, in my opinion.) * HTML parsing is supported in the synchronous mode, but I'd be quite happy to remove that support in order to curb sync XHR proliferation. * I believe the implementation otherwise matches the spec, but exposing the document via responseXML should be considered to be at risk. See below. Risks: * Stalling progress events while waiting for <meta> could, in theory, deadlock an existing Web app when the Web app does long polling with responseType == "", gets a text/html response without a charset declaration, the first chunk of the response is shorter than 1024 bytes and the server won't send more before the client side informs the server via another channel that the first chunk has been processed. - If this turns out to be a Real Problem, my plan is to make responseText show decoded text up to the first byte that isn't one of 0x09, 0x0A, 0x0C, 0x0D, 0x20 - 0x22, 0x26, 0x27, 0x2C - 0x3F, 0x41 - 0x5A, and 0x61 - 0x7A. - I think this risk is low. * responseXML now becomes non-null for HTTP error responses that have a text/html response body. This might be a problem if Web apps that expect to get XML responses check for HTTP errors by checking responseXML for null. We'll see how bad breakage nightly testers report. - I think this risk is high. - If this turns out to be a Real Problem, the solution would be to make HTML parsing (including the <meta> prescan) available only when responseType == "document". (Note that xhr.response maps to responseText when responseType == "", so if responseXML is made null when responseType == "", xhr.response wouldn't work for retrieving the tree.) This change might even be a good idea performance-wise to avoid adding HTML parsing overhead for legacy uses of XHR that don't set responseType. Spec change proposals so far: * I suggest making responseType modes other than "" and "document" not consider the internal character encoding declarations in HTML (or XML). Spec change proposals that I'm not making yet but might make in near future: * Making responseType == "" not support HTML parsing at all and to treat text/html as an unknown type for the purpose of character encoding. * Making XHR not support HTML parsing in the synchronous mode. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Wednesday, 16 November 2011 10:40:42 UTC