Leif Halvard Silli wrote:
> Andrew Cunningham On 09-10-12 16.12:
>> also not surprised by the indian localisations, had to be either
>> utf-8 or
>> win-1252. and guess win-1252 is a logical choice since firefox doesn't
>> really support legacy encodings for Indian languages, and good
>> percentage
>> of legacy content in indian languages is misidentifying itself as
>> iso-8859-1 or windows-1252 and relying on styling.
>
> Styling? You mean, the good old "font tag considered harmful" effect?
> Is that even possible to get to work any more? I know that Hebrew on
> the Web used to apply similar tricks - I think they used "the default
> latin encoding" and then "turned the text". But still, win-1252 isn't
> the default encoding of Hebrew?!
>
> Do you have example pages for wrong Indian language pages?
Most modern Indian language websites use Unicode, so my comments were
referring to legacy content, considering that is the context Ian has
been referring to. If I understand correctly. Although not sure why
HTML5 is concerning itself with legacy content, since its unlikely that
HTML5 spec. can cover all the needs of all legacy content, best to just
get HTML5 content right.
Personally I'm more concerned about limitations of correctly display
some Unicode content than I am about supporting
just going throw a few online Indian language newspapers that aren't in
utf-8, pages fall into two categories
1) No encoding declaration - so use what ever the browser default is,
e.g. http://www.aajkaal.net/
2) Declare as iso-8859-1 (which browsers treat as win-1252), e.g.
http://www.abasar.net/ and http://www.manoramaonline.com/
3) declare encoding as x-user-defined, e.g. http://www.anandabazar.com/
although at least in IE (English UI) x-user-defined is parsed as
Windows-1252, so in that version of the browser declaring x-user-defined
was effectively the same as declaring iso-8859-1 or windows-1252.
Which is why a lot of legacy content in some SE Asian scripts was always
delivered as images or PDF files, rather than as text in HTML documents.
Browsers assumed a win-1252 fall back so it was impossible to markup up
content in some languages using legacy content. The Karen languages
tended to fall into this category, and content is still delivered this
way by key websites in that language, although bloggers are migrating to
using pseudo-Unicode font solutions. Interetsing to note that there is
limited take up of Unicode 5.1+ solutions for Karen, since web browsers
are unable to correctly render or display Karen Unicode documents using
existing fonts that support the karen languages. Partly this is due to
limitations in CSS and in web browsers.
And I'm not sure how web browsers will be able to deal with the Unicode
vs pseudo-Unicode divisions occurring in Burmese, Karen, Mon and Shan
web content. I suspect that for these languages, browser developers have
limited or no knowledge of how the web developer community is developing
content in these languages or what encodings are in use. Or that there
is even a Unicode vs pseudo-Unicode content distinction in these languages.
Andrew
--
Andrew Cunningham
Senior Manager, Research and Development
Vicnet
State Library of Victoria
328 Swanston Street
Melbourne VIC 3000
Ph: +61-3-8664-7430
Fax: +61-3-9639-2175
Email: andrewc@vicnet.net.au
Alt email: lang.support@gmail.com
http://home.vicnet.net.au/~andrewc/
http://www.openroad.net.au
http://www.vicnet.net.au
http://www.slv.vic.gov.au