- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Fri, 30 Sep 2011 09:43:12 +0300
- To: public-webapps@w3.org
On Thu, Sep 29, 2011 at 11:27 PM, Jonas Sicking <jonas@sicking.cc> wrote: >> Finally, XHR allows the programmer using XHR to override the MIME >> type, including the charset parameter, so if the person adding new XHR >> code can't change the encoding declarations on legacy data, (s)he can >> override the UTF-8 last resort from JS (and a given repository of >> legacy data pretty often has a self-consistent encoding that the XHR >> programmer can discover ahead of time). I think requiring the person >> adding XHR code to write that line is much better than adding more >> locale and/or user setting-dependent behavior to the Web platform. > > This is certainly a good point, and is likely generally the easiest > solution for someone rolling out a AJAX version of a new website > rather than requiring webserver configuration changes. However it > still doesn't solve the case where a website uses different encodings > for different documents as described above. If we want to *really* address that problem, I think the right way to address it in XHR would be to add a way to XHR to override the HTML last resort encoding so that authors who are dealing with a content repository migrated partially to UTF-8 can set the last resort to the legacy encoding they know they have instead of ending up overriding the whole HTTP Content-Type for the UTF-8 content. (I'm assuming here that if someone is migrating a site from a legacy encoding to UTF-8, the UTF-8 parts declare that they are UTF-8. Authors who migrate to UTF-8 but are *still* after realizing that legacy encodings suck UTF-8 rocks too clueless to *declare* that they use UTF-8 don't deserve any further help from browsers, IMO.) > I'm particularly keen to hear how this will affect locales which do > not use ascii by default. Most of the contents I personally consume is > written in english or swedish. Most of which is generally legible even > if decoded using the wrong encoding. I'm under the impression that > that is not the case for for example Chinese or Hindi documents. I > think it would be sad if we went with any particular solution here > without consulting people from those locales. The old way of putting Hindi content on the Web relied on intentionally misencoded downloadable fonts. From the browser's point of view, such deep legacy text is Windows-1252. Hindi content that works without misencoded fonts is UTF-8. So I think Hindi isn't relevant to this thread. Users in CJK and Cyrillic locales are the ones most hurt by authors not declaring their encodings (well, actually, readers of CJK and Cyrillic languages whose browsers are configured for other locales are hurt *even* more), so I think it would be completely backwards for browsers to complicate new features in order to enable authors in the CJK and Cyrillic locales deploy *new* features and *still* not declare encodings. Instead, I think we should design new features to make authors everywhere get their act together and declare their encodings. (Note that this position is much less extreme than the more enlightened position e.g. HTML5 App Cache manifests take: Requiring everyone to use UTF-8 for a new feature so that declarations aren't needed.) -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Friday, 30 September 2011 06:43:49 UTC