- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Thu, 29 Sep 2011 14:49:17 +0300
- To: public-webapps@w3.org
http://dev.w3.org/2006/webapi/XMLHttpRequest-2/#text-response-entity-body says: "The text response entity body is a DOMString representing the response entity body." and "If charset is null and mime is text/html follow the rules set forth in the HTML specification to determine the character encoding. Let charset be the determined character encoding." Furthermore, the response entity body is defined while the state is LOADING: "The response entity body is the fragment of the entity body of the response received so far (LOADING) or the complete entity body of the response (DONE)." The spec is silent on what responseText for text/html should be if responseText is read before it is known that "the rules set forth in the HTML specification to determine the character encoding" will no longer change their result. This looks like a spec bug. There are three obvious solutions: 1) Change the encoding used for responseText as more data becomes available so that previous responseText is not guaranteed to be a prefix of subsequent responseText. 2) Make XHR pretend it hasn't seen any data at all before it has seen so much that the encoding decision is final. 3) Not using the HTML rules for responseText. Solution #1 is what Gecko now does with XML, but fortunately XML doesn't allow non-ASCII before the XML declaration, so you can't detect this from outside the black box. With HTML, solution #1 would mean handing a footgun to Web authors who might not prepare for cases where previous responseText stops being a prefix of subsequent responseText. Solution #2 could, in the worst case (assuming we aren't doing the worst of worst cases; i.e. we aren't allowing parser restarts arbitrarily late), stall until 1024 bytes has been seen, which risks breaking existing comet apps if there exist comet apps that use responseText with slowly-arriving text/html responses that don't have a BOM, don't have an early <meta> and don't have an HTTP charset and that require the JS part of the app to respond act on data within the first 1024 bytes before the server sends more. (OK, it would be silly to write comet apps with responseText using text/html as opposed to e.g. text/plain or whatever and not put a charset declaration on the HTTP layer, but this is the Web, so who knows if such apps exist.) Solution #3 would make the text/html side inconsistent with the XML side and could lead to confusion especially in the default mode if responseXML does honor <meta>s (within the first 1024 bytes). Solution #3 would be easy to implement, though. As a complication, since Saturday, Gecko supports a "moz-chunked-text" response type which modifies the behavior of response and responseText so that they only show a string consisting of new text since the previous progress event. "moz-chunked-text" isn't specced anywhere (to my knowledge), but IRC discussion with Olli indicates that it's assumed that, even going forward, the encoding decision is made the same way for "moz-chunked-text" and "text" response types. This assumption obviously excludes solution #1 above, since chunks reported before <meta> could use a different encoding compared to chunks after <meta>, which wouldn't make sense. It's worth noting that "moz-chunked-text" turns off responseXML, so it's not unthinkable to use non-HTML rules for "moz-chunked-text". In IRC discussion with Olli, we gravitated towards solution #2, but we didn't consider the comet stalling aspect in that discussion. In any case, all this should be specced properly and it currently isn't. :-( It seems to me that all these cannot be true: * responseText and responseXML use the same encoding detection rules. * The "text" and default modes use the same encoding detection rules. * "text" and "moz-chunked-text" use the same encoding detection rules. * "moz-chunked-text" uses the same encoding for all chunks. * All imaginable badly written comet apps are guaranteed to continue working. * responseXML considers <meta> in a deterministic way (no timer for bailing out before 1024 bytes if the network stalls). Which property do we give up? -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Thursday, 29 September 2011 11:49:55 UTC