Re: [XHR2] responseText for text/html before the encoding has stabilized

On Mon, Nov 7, 2011 at 9:57 AM, Jonas Sicking <jonas@sicking.cc> wrote:
> It would be really nice if we could move forward with this thread.

I was planning on reporting back when I have something that passes all
mochitests. This has been delayed by other stuff. Particularly fallout
from the new View Source highlighter.

> My preference is still to not do any HTML/XML specific processing when
> .responseType is set to anything other than "" or "document". This
> allows us to make encoding handling consistent for "text" and a
> possible future incremental text type.

My patch doesn't do HTML-specific processing when responseType is not
"" or "document".

> Also, the current spec leads to quite strange results if we end up
> supporting more text-based formats directly in XHR. For example in
> Gecko we've added experimental support for parsing into JSON. If we
> added this to a future version of XHR, this would mean that if a JSON
> resource was served as a "text/html" Content-Type, we'd simultaneously
> parse as HTML in order to detect encoding, and JSON in order to return
> a result to the page.

responseType == "" being weird that way with XML isn't new. I guess
the main difference is that mislabeling JSON as text/html might be
more probable than mislabeling as XML when e.g. PHP default to
text/html responses.

One way to address this is to not support new response types with
responseType == "" and force authors to set responseType to "json" if
they want to read responseJSON.

> So what I suggest is that we make the current steps 4 and 5 *only*
> apply if .responseType is set to "" or "document". This almost matches
> what we've implemented in Gecko, though in gecko we also skip step 6
> which IMHO is a bug (if for no other reason, we should skip a UTF8 BOM
> if one is present).

Makes sense.

> As to the question which HTML charset encoding-detection rules to
> apply when .responseType is set to "" or "document" and content is
> served as HTML I'm less sure what the answer is. It appears clear that
> we can't reload a resource the same way normal page does when hitting
> a <meta> which wasn't found during prescan and which declares a
> charset different from the one currently used.
>
> However my impression is that a good number of HTML documents out
> there don't use UTF8 and do declare a charset using <meta> within the
> first 1024 bytes. Additionally I do hear *a lot* that authors have a
> hard time setting HTTP header due to not having full access to
> configurations of their hosting server (as well as configurations
> being hard to do even when access is available).
>
> Hence it seems like we at least want to run the prescan, though if
> others think otherwise I'd be interested to hear.

My current patch runs the prescan.

> There is also the issue of if we should take into account the encoding
> of the page which started the XHR (we do for navigation at least in
> Gecko), as well as if we should take user settings into account. I
> still believe that we'll exclude large parts of the world from
> transitioning to developing "AJAX" based websites if we drop all of
> these things, however I have not yet gathered that data.

I think we shouldn't take the encoding of the invoking page into
account. We have an excellent opportunity to avoid propagating that
kind of legacy badness. I think we should take the opportunity to make
a new feature less crazy.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Monday, 7 November 2011 15:44:29 UTC