- From: Boris Zbarsky <bzbarsky@MIT.EDU>
- Date: Sat, 24 Jul 2010 02:02:39 -0400
On 7/24/10 1:50 AM, Brett Zamir wrote: >> I would be particularly interested in data on this last, across >> different browsers, operating systems, and locales... There seem to be >> servers out there expecting their URIs in UTF-8 and others expecting >> them in ISO-8859-1, and it's not clear to me how to make things work >> with them all. > > Seems to me that if they are not in UTF-8, they should be treated as > bugs, even if that is not a de jure standard. Treated as bugs by whom? The scenario is that a user types some non-ASCII text in the url bar. This needs to be url-encoded to actually go on the wire, which raises the question of what encoding. If the user is using IRIs, the answer is UTF-8. A number of servers barf if you do this, especially because some server-side scripting languages (PHP, e.g., last I checked) default to URI-unescaping via something other than UTF-8. So some browser encode the non-query part of the URI as UTF-8 and the query part as ... something (user's default filesystem encoding, say, for lack of a better guess). Others always use UTF-8 (and end up with some servers not usable). Others... I have no idea. That's why I want data. ;) In particular, while the "just use UTF-8, and if the user can't access the site sucks to be the user" approach has a certain theoretical-purity appeal, it doesn't seem like something I want to do to my friends and family (always a good criterion for things you'd like to do to users). -Boris
Received on Friday, 23 July 2010 23:02:39 UTC