Re: Guessing the fallback encoding from the top-level domain name before trying to guess from the browser localization from Martin J. Dürst on 2013-12-23 (www-international@w3.org from October to December 2013)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Mon, 23 Dec 2013 18:17:02 +0900
To: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
CC: Henri Sivonen <hsivonen@hsivonen.fi>, www-international@w3.org
Message-ID: <52B7FF8E.1070302@it.aoyama.ac.jp>

On 2013/12/23 9:00, Leif Halvard Silli wrote:
> Henri Sivonen, Thu, 19 Dec 2013 16:29:37 +0200:
>>
>> The list of TLDs that participate in the guessing and are not
>> windows-1252-affiliated is currently:
>>
> https://bugzilla.mozilla.org/attachment.cgi?id=8341644&action=diff#a/dom/encoding/domainsfallbacks.properties_sec2

> But not all domains are “legacy domains” either. Consider, from the
> above list, line 139 and 140:
>
>  139 ru=windows-1251
>  140 xn--p1ai=windows-1251
>
> where xn--p1ai refers to the RF-domain - .рф. Is there really no
> correlation between UTF-8 based domain names and use of the UTF-8
> encoding ... ?

I don't think non-ASCII domain names should be called UTF-8 based domain 
names, but the general thought that these rather new domains might 
contain considerably less legacy content than the two-letter ASCII 
country domains seems quite attractive.

Overall, I agree with the question by others of what's the expected 
"ROI" on this is. With UTF-8 being more and more popular for Web sites, 
the return for changing fallback encodings is definitely deminishing.

Regards,   Martin.

Received on Monday, 23 December 2013 09:17:48 UTC