Leif Halvard Silli wrote: > Andrew Cunningham On 09-10-12 16.12: >> also not surprised by the indian localisations, had to be either >> utf-8 or >> win-1252. and guess win-1252 is a logical choice since firefox doesn't >> really support legacy encodings for Indian languages, and good >> percentage >> of legacy content in indian languages is misidentifying itself as >> iso-8859-1 or windows-1252 and relying on styling. > > Styling? You mean, the good old "font tag considered harmful" effect? > Is that even possible to get to work any more? I know that Hebrew on > the Web used to apply similar tricks - I think they used "the default > latin encoding" and then "turned the text". But still, win-1252 isn't > the default encoding of Hebrew?! > > Do you have example pages for wrong Indian language pages? Most modern Indian language websites use Unicode, so my comments were referring to legacy content, considering that is the context Ian has been referring to. If I understand correctly. Although not sure why HTML5 is concerning itself with legacy content, since its unlikely that HTML5 spec. can cover all the needs of all legacy content, best to just get HTML5 content right. Personally I'm more concerned about limitations of correctly display some Unicode content than I am about supporting just going throw a few online Indian language newspapers that aren't in utf-8, pages fall into two categories 1) No encoding declaration - so use what ever the browser default is, e.g. http://www.aajkaal.net/ 2) Declare as iso-8859-1 (which browsers treat as win-1252), e.g. http://www.abasar.net/ and http://www.manoramaonline.com/ 3) declare encoding as x-user-defined, e.g. http://www.anandabazar.com/ although at least in IE (English UI) x-user-defined is parsed as Windows-1252, so in that version of the browser declaring x-user-defined was effectively the same as declaring iso-8859-1 or windows-1252. Which is why a lot of legacy content in some SE Asian scripts was always delivered as images or PDF files, rather than as text in HTML documents. Browsers assumed a win-1252 fall back so it was impossible to markup up content in some languages using legacy content. The Karen languages tended to fall into this category, and content is still delivered this way by key websites in that language, although bloggers are migrating to using pseudo-Unicode font solutions. Interetsing to note that there is limited take up of Unicode 5.1+ solutions for Karen, since web browsers are unable to correctly render or display Karen Unicode documents using existing fonts that support the karen languages. Partly this is due to limitations in CSS and in web browsers. And I'm not sure how web browsers will be able to deal with the Unicode vs pseudo-Unicode divisions occurring in Burmese, Karen, Mon and Shan web content. I suspect that for these languages, browser developers have limited or no knowledge of how the web developer community is developing content in these languages or what encodings are in use. Or that there is even a Unicode vs pseudo-Unicode content distinction in these languages. Andrew -- Andrew Cunningham Senior Manager, Research and Development Vicnet State Library of Victoria 328 Swanston Street Melbourne VIC 3000 Ph: +61-3-8664-7430 Fax: +61-3-9639-2175 Email: andrewc@vicnet.net.au Alt email: lang.support@gmail.com http://home.vicnet.net.au/~andrewc/ http://www.openroad.net.au http://www.vicnet.net.au http://www.slv.vic.gov.auReceived on Wednesday, 14 October 2009 01:53:45 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 14 October 2009 01:53:47 GMT