- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Wed, 16 Nov 2011 14:18:39 +0100
- To: "public-webapps@w3.org" <public-webapps@w3.org>
Hi, while trying to fix a bug I discovered a weird workaround in Mozilla (<https://bugzilla.mozilla.org/show_bug.cgi?id=397234>). The summary is: 1) when sending text using send(""), all browsers encode in UTF-8 2) the caller may have set the content-type header field before 3) if this was the case, the charset, if present, needs to be adjusted (<http://www.w3.org/TR/XMLHttpRequest/#the-send-method>) 4) due to broken content (GWT), Mozilla tries to preserve the case of the charset name, if it was the "right" one (so if the caller set 'UtF-8', that's what get's onto the wire). Apparently this was added because some servers didn't handle charset names properly. So I wrote some tests to compare FF's behavior with other UAs. Summary: - all UAs use the UTF-8 encoding for the payload - Opera and IE do not rewrite the type; so if the caller sets the wrong charset, this is what is sent to the server - Chrome, Safari and FF try to fix the charset param. All of them preserve the syntax (quoted-string vs token) and also handle single quotes incorrectly. - Finally, only Firefox attempts to preserve the casing of the charset param - this may indicate that the workaround added for the aforementioned bug isn't needed anymore. Test code: import java.io.IOException; import java.io.InputStream; import java.io.OutputStream; import java.net.InetSocketAddress; import java.util.List; import com.sun.net.httpserver.HttpExchange; import com.sun.net.httpserver.HttpHandler; import com.sun.net.httpserver.HttpServer; public class XHRContentTypeRewriting { public static void main(String[] args) throws IOException { HttpServer server = HttpServer.create(new InetSocketAddress(8080), 0); server.createContext("/start", new ServeHtml()); server.createContext("/report", new Report()); server.setExecutor(null); server.start(); } private static class ServeHtml implements HttpHandler { @Override public void handle(HttpExchange h) throws IOException { String response = "<html><head><title>XHR Content-Type Rewriting Test</title>" + "<script>" + "function post(type) {" + " var req = new XMLHttpRequest();\n" + " req.open ('POST', '/report', false);" + " req.setRequestHeader('Content-Type', type);\n" + " req.setRequestHeader('X-Test', type);\n" + " req.send('pound: \\u00a3');\n" + "}\n" + "function run() {\n" + " post('text/plain');\n" + " post('text/plain; charset=foo');\n" + " post('text/plain; charset=Iso-8859-1');\n" + " post('text/plain; charset=Utf-8');\n" + " post('text/plain; charset=\\'foo\\'');\n" + " post('text/plain; charset=\\'Iso-8859-1\\'');\n" + " post('text/plain; charset=\\'Utf-8\\'');\n" + " post('text/plain; charset=\"foo\"');\n" + " post('text/plain; charset=\"Iso-8859-1\"');\n" + " post('text/plain; charset=\"Utf-8\"');\n" + " post('text/plain; foo=\\'; charset=UTF-8');\n" + " post('text/plain; format=flowed; charset=ISO-8859-1');\n" + " post('text/plain; charset=ISO-8859-1; format=flowed');\n" + "}\n" + "</script>" + "</head><body onload='run();'>" + "</body></html>"; h.getResponseHeaders().set("Content-Type", "text/html; charset=UTF-8"); h.sendResponseHeaders(200, response.getBytes().length); OutputStream os = h.getResponseBody(); os.write(response.getBytes()); os.close(); } } private static class Report implements HttpHandler { @Override public void handle(HttpExchange h) throws IOException { List<String> ua = h.getRequestHeaders().get("User-Agent"); List<String> ct = h.getRequestHeaders().get("Content-Type"); List<String> xt = h.getRequestHeaders().get("X-Test"); InputStream is = h.getRequestBody(); int r; StringBuilder payload = new StringBuilder(); do { r = is.read(); if (r >= 0) payload.append(String.format("%02x ", r)); } while (r >= 0); String response = "User-Agent: " + ua + "\n" + " intended: " + xt + "\n" + " received: " + ct + "\n" + " payload: " + payload.toString() + "\n"; System.err.println(response); h.getResponseHeaders().set("Content-Type", "text/plain; charset=UTF-8"); h.sendResponseHeaders(200, response.getBytes().length); OutputStream os = h.getResponseBody(); os.write(response.getBytes()); os.close(); } } } Results, with comments added: User-Agent: [Mozilla/5.0 (Windows NT 6.1; WOW64; rv:8.0) Gecko/20100101 Firefox/8.0] intended: [text/plain] received: [text/plain; charset=UTF-8] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset=foo] received: [text/plain; charset=UTF-8] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset=Iso-8859-1] received: [text/plain; charset=UTF-8] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset=Utf-8] received: [text/plain; charset=Utf-8] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset='foo'] received: [text/plain; charset=UTF-8] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset='Iso-8859-1'] received: [text/plain; charset=UTF-8] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset='Utf-8'] received: [text/plain; charset='Utf-8'] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset="foo"] received: [text/plain; charset=UTF-8] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset="Iso-8859-1"] received: [text/plain; charset=UTF-8] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset="Utf-8"] received: [text/plain; charset="Utf-8"] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; foo='; charset=UTF-8] received: [text/plain; charset=UTF-8; foo='; charset=UTF-8] payload: 70 6f 75 6e 64 3a 20 c2 a3 # confused by single quote in preceding param intended: [text/plain; format=flowed; charset=ISO-8859-1] received: [text/plain; format=flowed; charset=UTF-8] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset=ISO-8859-1; format=flowed] received: [text/plain; charset=UTF-8; format=flowed] payload: 70 6f 75 6e 64 3a 20 c2 a3 User-Agent: [Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)] # doesn't touch the type, thus sends inconsistent charset information intended: [text/plain] received: [text/plain] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset=foo] received: [text/plain; charset=foo] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset=Iso-8859-1] received: [text/plain; charset=Iso-8859-1] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset=Utf-8] received: [text/plain; charset=Utf-8] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset='foo'] received: [text/plain; charset='foo'] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset='Iso-8859-1'] received: [text/plain; charset='Iso-8859-1'] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset='Utf-8'] received: [text/plain; charset='Utf-8'] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset="foo"] received: [text/plain; charset="foo"] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset="Iso-8859-1"] received: [text/plain; charset="Iso-8859-1"] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset="Utf-8"] received: [text/plain; charset="Utf-8"] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; foo='; charset=UTF-8] received: [text/plain; foo='; charset=UTF-8] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; format=flowed; charset=ISO-8859-1] received: [text/plain; format=flowed; charset=ISO-8859-1] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset=ISO-8859-1; format=flowed] received: [text/plain; charset=ISO-8859-1; format=flowed] payload: 70 6f 75 6e 64 3a 20 c2 a3 User-Agent: [Opera/9.80 (Windows NT 6.1; U; en) Presto/2.9.168 Version/11.51] # doesn't touch the type, thus sends inconsistent charset information intended: [text/plain] received: [text/plain] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset=foo] received: [text/plain; charset=foo] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset=Iso-8859-1] received: [text/plain; charset=Iso-8859-1] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset=Utf-8] received: [text/plain; charset=Utf-8] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset='foo'] received: [text/plain; charset='foo'] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset='Iso-8859-1'] received: [text/plain; charset='Iso-8859-1'] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset='Utf-8'] received: [text/plain; charset='Utf-8'] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset="foo"] received: [text/plain; charset="foo"] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset="Iso-8859-1"] received: [text/plain; charset="Iso-8859-1"] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset="Utf-8"] received: [text/plain; charset="Utf-8"] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; foo='; charset=UTF-8] received: [text/plain; foo='; charset=UTF-8] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; format=flowed; charset=ISO-8859-1] received: [text/plain; format=flowed; charset=ISO-8859-1] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset=ISO-8859-1; format=flowed] received: [text/plain; charset=ISO-8859-1; format=flowed] payload: 70 6f 75 6e 64 3a 20 c2 a3 User-Agent: [Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.51.22 (KHTML, like Gecko) Version/5.1.1 Safari/534.51.22] intended: [text/plain] received: [text/plain] payload: 70 6f 75 6e 64 3a 20 c2 a3 # charset missing intended: [text/plain; charset=foo] received: [text/plain; charset=UTF-8] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset=Iso-8859-1] received: [text/plain; charset=UTF-8] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset=Utf-8] received: [text/plain; charset=UTF-8] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset='foo'] received: [text/plain; charset='UTF-8'] payload: 70 6f 75 6e 64 3a 20 c2 a3 # broken single quotes preserved, charset rewritten intended: [text/plain; charset='Iso-8859-1'] received: [text/plain; charset='UTF-8'] payload: 70 6f 75 6e 64 3a 20 c2 a3 # broken single quotes preserved, charset rewritten intended: [text/plain; charset='Utf-8'] received: [text/plain; charset='UTF-8'] payload: 70 6f 75 6e 64 3a 20 c2 a3 # broken single quotes preserved, charset rewritten intended: [text/plain; charset="foo"] received: [text/plain; charset="UTF-8"] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset="Iso-8859-1"] received: [text/plain; charset="UTF-8"] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset="Utf-8"] received: [text/plain; charset="UTF-8"] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; foo='; charset=UTF-8] received: [text/plain; foo='; charset=UTF-8] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; format=flowed; charset=ISO-8859-1] received: [text/plain; format=flowed; charset=UTF-8] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset=ISO-8859-1; format=flowed] received: [text/plain; charset=UTF-8; format=flowed] payload: 70 6f 75 6e 64 3a 20 c2 a3 User-Agent: [Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.120 Safari/535.2] intended: [text/plain] received: [text/plain] payload: 70 6f 75 6e 64 3a 20 c2 a3 # charset missing intended: [text/plain; charset=foo] received: [text/plain; charset=UTF-8] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset=Iso-8859-1] received: [text/plain; charset=UTF-8] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset=Utf-8] received: [text/plain; charset=UTF-8] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset='foo'] received: [text/plain; charset='UTF-8'] payload: 70 6f 75 6e 64 3a 20 c2 a3 # broken single quotes preserved, charset rewritten intended: [text/plain; charset='Iso-8859-1'] received: [text/plain; charset='UTF-8'] payload: 70 6f 75 6e 64 3a 20 c2 a3 # broken single quotes preserved, charset rewritten intended: [text/plain; charset='Utf-8'] received: [text/plain; charset='UTF-8'] payload: 70 6f 75 6e 64 3a 20 c2 a3 # broken single quotes preserved, charset rewritten intended: [text/plain; charset="foo"] received: [text/plain; charset="UTF-8"] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset="Iso-8859-1"] received: [text/plain; charset="UTF-8"] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset="Utf-8"] received: [text/plain; charset="UTF-8"] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; foo='; charset=UTF-8] received: [text/plain; foo='; charset=UTF-8] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; format=flowed; charset=ISO-8859-1] received: [text/plain; format=flowed; charset=UTF-8] payload: 70 6f 75 6e 64 3a 20 c2 a3 intended: [text/plain; charset=ISO-8859-1; format=flowed] received: [text/plain; charset=UTF-8; format=flowed] payload: 70 6f 75 6e 64 3a 20 c2 a3 Hope this helps, Julian
Received on Wednesday, 16 November 2011 13:19:14 UTC