W3C home > Mailing lists > Public > whatwg@whatwg.org > November 2011

[whatwg] Default encoding to UTF-8?

From: Faruk Ates <farukates@me.com>
Date: Wed, 30 Nov 2011 15:28:40 -0800
Message-ID: <1462FAA3-FD8E-4124-9EE0-5DA8A506677B@me.com>
My understanding is that all browsers* default to Western Latin (ISO-8859-1) encoding by default (for Western-world downloads/OSes) due to legacy content on the web. But how relevant is that still today? Has any browser done any recent research into the need for this?

I'm wondering if it might not be good to start encouraging defaulting to UTF-8, and only fallback to Western Latin if it is detected that the content is very old / served by old infrastructure or servers, etc. And of course if the content is served with an explicit encoding of Western Latin.

We like to think that ?every web developer is surely building things in UTF-8 nowadays? but this is far from true. I still frequently break websites and webapps simply by entering my name (Faruk Ate?). Occasionally people complain or file issues about my and others? open source scripts because their build systems or compilers break due to my name in the copyright, or something as innocuous as a proper apostrophe (? rather than ') in a comment.

Yes, I understand that that particular issue is something we ought to fix through evangelism, but I think that WHATWG/browser vendors can help with this while at the same time (rightly, smartly) making the case that the web of tomorrow should be a UTF-8 (and 16) based one, not a smorgasbord of different encodings.

So hence my question whether any vendor has done any recent research in this. Mobile browsers seem to have followed desktop browsers in this; perhaps this topic was tested and researched in recent times as part of that, but I couldn't find any such data. The only real relevant thread of discussion around UTF-8 as a default was this one about Web Workers:
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-September/023197.html

?which basically suggested that everyone is hugely in favor of UTF-8 and making it a default wherever possible. 

So how 'bout it? What's going in this area, if anything?

Sincerely,
Faruk Ate?


* No idea about IE, admittedly, but I presume it follows along in this.
Received on Wednesday, 30 November 2011 15:28:40 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:59:38 UTC