RE: Auto-detect and encodings in HTML5

>From time to time I do have to select the encoding. Maybe because Hebrew is
not my default language. The heuristic detectors could be improved.

Jony

-----Original Message-----
From: www-international-request@w3.org
[mailto:www-international-request@w3.org] On Behalf Of Henri Sivonen
Sent: Monday, June 01, 2009 11:54 AM
To: Jonathan Rosenne
Cc: 'M.T.Carrasco Benitez'; 'Travis Leithead'; 'Erik van der Poel';
public-html@w3.org; www-international@w3.org; 'Richard Ishida'; 'Ian
Hickson'; 'Chris Wilson'; 'Harley Rosnow'
Subject: Re: Auto-detect and encodings in HTML5

On Jun 1, 2009, at 11:08, Jonathan Rosenne wrote:

> Not only CJK and Cyrillic, also Hebrew and


I had thought that existing Hebrew content largely didn't have the  
problem of lacking encoding labels. (Isn't even the most legacy Visual  
Hebrew content generally *encoding*-labeled even if not *direction*- 
labeled?)

I observe that existing heuristic detectors don't tend to support  
Hebrew encodings. This suggests that either content is generally  
labeled or there's one dominant encoding (which one? Windows-1255?),  
since developing heuristic detection wasn't necessary to break into  
the Hebrew browsing market.

How bad is breakage if a non-Hebrew encoding default is in effect and  
the user browses the Hebrew part of the Web?

> I suppose many other non-Latin languages.

There are also Latin non-Windows-1252 encodings, but it doesn't  
automatically follow that there's a serious legacy of unlabeled  
content in every legacy encoding. (Serious meaning: Users would reject  
a browser that didn't allow them to set a locale-specific last-resort  
encoding or that didn't tie a locale-specific last-resort encoding to  
the UI language.)

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Monday, 1 June 2009 09:17:51 UTC