- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Mon, 7 Nov 2011 17:43:59 +0200
- To: public-webapps@w3.org
On Mon, Nov 7, 2011 at 9:57 AM, Jonas Sicking <jonas@sicking.cc> wrote: > It would be really nice if we could move forward with this thread. I was planning on reporting back when I have something that passes all mochitests. This has been delayed by other stuff. Particularly fallout from the new View Source highlighter. > My preference is still to not do any HTML/XML specific processing when > .responseType is set to anything other than "" or "document". This > allows us to make encoding handling consistent for "text" and a > possible future incremental text type. My patch doesn't do HTML-specific processing when responseType is not "" or "document". > Also, the current spec leads to quite strange results if we end up > supporting more text-based formats directly in XHR. For example in > Gecko we've added experimental support for parsing into JSON. If we > added this to a future version of XHR, this would mean that if a JSON > resource was served as a "text/html" Content-Type, we'd simultaneously > parse as HTML in order to detect encoding, and JSON in order to return > a result to the page. responseType == "" being weird that way with XML isn't new. I guess the main difference is that mislabeling JSON as text/html might be more probable than mislabeling as XML when e.g. PHP default to text/html responses. One way to address this is to not support new response types with responseType == "" and force authors to set responseType to "json" if they want to read responseJSON. > So what I suggest is that we make the current steps 4 and 5 *only* > apply if .responseType is set to "" or "document". This almost matches > what we've implemented in Gecko, though in gecko we also skip step 6 > which IMHO is a bug (if for no other reason, we should skip a UTF8 BOM > if one is present). Makes sense. > As to the question which HTML charset encoding-detection rules to > apply when .responseType is set to "" or "document" and content is > served as HTML I'm less sure what the answer is. It appears clear that > we can't reload a resource the same way normal page does when hitting > a <meta> which wasn't found during prescan and which declares a > charset different from the one currently used. > > However my impression is that a good number of HTML documents out > there don't use UTF8 and do declare a charset using <meta> within the > first 1024 bytes. Additionally I do hear *a lot* that authors have a > hard time setting HTTP header due to not having full access to > configurations of their hosting server (as well as configurations > being hard to do even when access is available). > > Hence it seems like we at least want to run the prescan, though if > others think otherwise I'd be interested to hear. My current patch runs the prescan. > There is also the issue of if we should take into account the encoding > of the page which started the XHR (we do for navigation at least in > Gecko), as well as if we should take user settings into account. I > still believe that we'll exclude large parts of the world from > transitioning to developing "AJAX" based websites if we drop all of > these things, however I have not yet gathered that data. I think we shouldn't take the encoding of the invoking page into account. We have an excellent opportunity to avoid propagating that kind of legacy badness. I think we should take the opportunity to make a new feature less crazy. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Monday, 7 November 2011 15:44:29 UTC