- From: Maciej Stachowiak <mjs@apple.com>
- Date: Sun, 29 Jul 2007 08:26:11 -0700
- To: Jonas Sicking <jonas@sicking.cc>
- Cc: Web APIs WG <public-webapi@w3.org>
On Jul 28, 2007, at 11:38 PM, Jonas Sicking wrote: > Maciej Stachowiak wrote: >> On Jul 27, 2007, at 12:09 PM, Jonas Sicking wrote: >>> >>> Anne van Kesteren wrote: >>>> I've been looking at overrideMimeType implementations in Gecko >>>> and WebKit and it seems like they differ a bit. In Gecko it has >>>> to be invoked before send(), but in WebKit it would work if you >>>> invoke it just before getting responseXML or responseText. >>>> Neither implementation seems to do any input checks. >>>> If you have any opinion on how it should be specified I suppose >>>> now would be the time to air your thoughts. >>> >>> Of course I prefer the mozilla way :) >>> >>> It does seem fairly complicated to allow it to be set after the >>> download is finished though. You do have the stream stored >>> in .reponseBody, but at that point all encoding information has >>> been lost. For HTML parsing (which I hope the spec will support in >>> the future) there are a pile of rules used to guess the encoding, >>> all of which would be useful to use, but can't be used if all you >>> have access to is the unencoded responseBody. >> Why would the encoding information be lost? The only sources of >> encoding info are the responseText itself and http headers, both of >> which the XMLHttpResponse needs to provide anyway. > > ResponseText is not the raw byte stream gotten off the wire, it is > already decoded into utf16 using whatever algorithm we define for > determining the encoding. HTML decoding is a lot more complicated > since you have to first guess an encoding, then start to parse the > document, but if you find a > > <meta http-equiv="Content-Type" content="text/html; charset=?"> > > Where charset is different from what you guessed, you have to > restart from the beginning using the charset defined in the meta tag. > > Yes, it would definitely be possible for the implementation to keep > around the raw byte stream and either lazily decode responseText, or > keep both the utf16 responseText and the raw byte stream around. A third possibility is to remember what encoding you used when decoding and turn the UTF-16 back into the original bytes, though I suppose that wouldn't work if you hit encoding errors originally. > It is a bit quirky behavior though since setting overrideMimeType > could then change the encoding and therefor both responseXML and > responseText. If XHR2 offers responseBody with a raw byte array of some kind, it will be required for implementations to keep the raw bytes around anyway. Regards, Maciej
Received on Sunday, 29 July 2007 15:26:20 UTC