Re: [XHR2] responseText for text/html before the encoding has stabilized from Jonas Sicking on 2011-11-07 (public-webapps@w3.org from October to December 2011)

From: Jonas Sicking <jonas@sicking.cc>
Date: Sun, 6 Nov 2011 23:57:52 -0800
To: Henri Sivonen <hsivonen@iki.fi>
Cc: public-webapps@w3.org
Message-ID: <CA+c2ei9GZLd-jWFCK3dGXHigSx6miY+40LNoY_rEH3k==BAz8g@mail.gmail.com>

It would be really nice if we could move forward with this thread.

My preference is still to not do any HTML/XML specific processing when
.responseType is set to anything other than "" or "document". This
allows us to make encoding handling consistent for "text" and a
possible future incremental text type.

Also, the current spec leads to quite strange results if we end up
supporting more text-based formats directly in XHR. For example in
Gecko we've added experimental support for parsing into JSON. If we
added this to a future version of XHR, this would mean that if a JSON
resource was served as a "text/html" Content-Type, we'd simultaneously
parse as HTML in order to detect encoding, and JSON in order to return
a result to the page.

So what I suggest is that we make the current steps 4 and 5 *only*
apply if .responseType is set to "" or "document". This almost matches
what we've implemented in Gecko, though in gecko we also skip step 6
which IMHO is a bug (if for no other reason, we should skip a UTF8 BOM
if one is present).

As to the question which HTML charset encoding-detection rules to
apply when .responseType is set to "" or "document" and content is
served as HTML I'm less sure what the answer is. It appears clear that
we can't reload a resource the same way normal page does when hitting
a <meta> which wasn't found during prescan and which declares a
charset different from the one currently used.

However my impression is that a good number of HTML documents out
there don't use UTF8 and do declare a charset using <meta> within the
first 1024 bytes. Additionally I do hear *a lot* that authors have a
hard time setting HTTP header due to not having full access to
configurations of their hosting server (as well as configurations
being hard to do even when access is available).

Hence it seems like we at least want to run the prescan, though if
others think otherwise I'd be interested to hear.

There is also the issue of if we should take into account the encoding
of the page which started the XHR (we do for navigation at least in
Gecko), as well as if we should take user settings into account. I
still believe that we'll exclude large parts of the world from
transitioning to developing "AJAX" based websites if we drop all of
these things, however I have not yet gathered that data.

/ Jonas

Received on Monday, 7 November 2011 07:59:00 UTC