Leif Halvard Silli wrote:
> Andrew Cunningham On 09-10-14 03.53:
>
>
> The reason, as much as I have picked up, is about market shares. And
> the "poster child" here is Windows-1252.
>
I realise that, but if market share is the issue, then trhe reality is
that microsoft is setting the trends here having the lions share of the
market in terms of OS, and if oyu look at Microsoft policy all new
languages if not encompassed by an existing code page are ONLY supported
via unicode. Its been said often enough, in enough forums over the years.
>>
>>
>> 3) declare encoding as x-user-defined, e.g. http://www.anandabazar.com/
>>
>> although at least in IE (English UI) x-user-defined is parsed as
>> Windows-1252, so in that version of the browser declaring
>> x-user-defined was effectively the same as declaring iso-8859-1 or
>> windows-1252.
>>
>> Which is why a lot of legacy content in some SE Asian scripts was
>> always delivered as images or PDF files, rather than as text in HTML
>> documents.
>
>
> Which are served just as well as UTF-8?
>
>> Browsers assumed a win-1252 fall back so it was impossible to markup
>> up content in some languages using legacy content. The Karen
>> languages tended to fall into this category, and content is still
>> delivered this way by key websites in that language, although
>> bloggers are migrating to using pseudo-Unicode font solutions.
>
>
> What do you mean by "pseudo-Unicode"?
>
pseudo-Unicode is the practice of remapping glyph based 8-bit legacy
encodings to Unicode fonts, In terms of the myanmar script, for Burmese,
etc. this means remian some glyphs to actual Unicode codepoints and
assigning other glyphs to codepoints in the same block unused by the
langauge in question or to the PUA and glyphs access directly by codepoint
unicode uses a character based model
pseudo-unicode uses a glyph based model that in many instances reassigns
glyphs to codepoints required by other languages using the same script.
For instance, with Burmese, the majority of online content uses a
pseudo-Unicode font that reuses codepoints required for Mon, S'gaw
karen, Shan and other languages pseudo unicode data can not be correctly
displayed or read with Unicode capable fonts either the Unicode 4.1/5,0
version fonts or the Unicode 5.1+ fonts
At the moment pseudo Unicode is more common for Burmese web content than
Unicode. And in some projects has lead to splintering, i.e. the Burmese
wikipedia project that uses Unicode 5.1 vs a splinter group that created
a new wiki using pseudo-Unicode. Its a political issue in Burmese web
development and IT communities.
>
> Forgive me for being occupied with those languages which are already
> supported. Here is some Mozilla critic:
>
nothing to forgive, spent many many years myself concerned about those
languages, but there are many languages who's needs are forgotten by
developers and specification writers.
Andrew
--
Andrew Cunningham
Senior Manager, Research and Development
Vicnet
State Library of Victoria
328 Swanston Street
Melbourne VIC 3000
Ph: +61-3-8664-7430
Fax: +61-3-9639-2175
Email: andrewc@vicnet.net.au
Alt email: lang.support@gmail.com
http://home.vicnet.net.au/~andrewc/
http://www.openroad.net.au
http://www.vicnet.net.au
http://www.slv.vic.gov.au