- From: Alexey Proskuryakov <ap-carbon@rambler.ru>
- Date: Fri, 18 May 2007 12:10:23 +0400
- To: Anne van Kesteren <annevk@opera.com>, "Web API WG (public)" <public-webapi@w3.org>
On 5/17/07 8:09 PM, "Anne van Kesteren" <annevk@opera.com> wrote: > Based on feedback from Microsoft the algorithm used by responseText now > takes the potential BOM of the entity body into account. Please let me > know if you spot any issues with this: I'm not quite sure about having two separate variables for both "charset" and "charset-http". If I'm not mistaken, the algorithm can be streamlined by using only one of these: ----------------------- 1. If the response entity body is "null" return null and terminate these steps. 2. Let charset be "null". 3. If there is no Content-Type header or there is a Content-Type header which contains a MIME type that is text/xml, application/xml, text/xsl or ends in +xml (ignoring any parameters) use the rules set forth in the XML specification to determine the character encoding. Let charset be the determined character encoding ***and terminate these steps***. 4. If charset is "null" and the Content-Type MIME type contains a charset parameter let charset be the value of that parameter. 5. If charset is "null" <do the BOM detection>. 6. If charset is "null" let charset be "UTF-8". 7. Return the result of decoding the response entity body using charset. Or, if that fails, return null. ----------------------- I think step 5 (BOM detection) could be written in a declarative manner similar to how it is defined in CSS <http://www.w3.org/TR/CSS21/syndata.html#q23>. The current algorithm may be slightly misguiding in that it misses some edge cases (what to do if the reply is shorter than 4 bytes?) that should only be interesting to implementors anyway. - WBR, Alexey Proskuryakov
Received on Friday, 18 May 2007 08:10:35 UTC