RE: Languages in Impress / ODF

Hi Christophe,


> Thanks for the info. (Maybe we should discuss this on another list.)

Perhaps we should :-) Let me just answer this one here... In case you've
found additional issues with OOo or other ODF-capable products, it
would be helpful if you could send me a list of those issues + some
sample documents (or talk to Louis from OOo or Mingfei Jia from IBM)
to figure out if it is a bug in the implementation and/or something that
needs to be fixed in the spec itself.


> It gets weirder when you use a language that is not in
> OpenOffice.org's list of "Western" languages (a list that funnily
> includes Mongolian, Swahili, Vietnamese and Georgian!): when you
> enable "Asian" language, set "Chinese (simplified" as default Asian
> language, set default Western language to "None", and create some
> text in Chinese, you can observe the following: as long as you enter
> Chinese characters, the language in the status bar says "Chinese
> (simplified)", but when you enter a numeral (not the Chinese
> characters for numerals but "Arabic" numerals), the text language for
> that numeral changes to whatever the default Western language is (in
> this case "None"); the same happens when you enter a Latin full stop
(instead of a Chinese full stop).


Ah OK, now here is where the _real_ fun begins...

Actually a style in ODF has 3 (three !) "normal" language attributes:

- fo:language
- style:language-asian
- style:language-complex

And like I mentioned in my last mail, the titles, paragraphs etc get their
language info from a style.

The spec says that, when a CJK-character is encountered, the second
attribute is to be evaluated (style:language-asian), and for CTL it is "of
course" the style:language-complex attribute.

So "western" is actually "unicode character that isn't CJK nor CTL"


This explains the status-bar behavior and probably explains the export
to PDF behavior as well (though the PDF export might just pick the
fo:language, haven't tested it for non-western)

There are also 3 country attributes that may influence this, and 3 script
attributes for the writing script _and_ (on top of all that) 3 rfc-language
attributes in case one still couldn't nail down the exact language/country
combination using all the other attributes (in that case, a "closest match"
using the other attributes must also be present). 
I think the rfc-ones are new attributes in the ODF 1.2 draft (which is the
default version in OOo 3.2, by the way.

And 1 style:script for compatibility with CSS etc that _may_ be used to
indicate if the style is actually western/asian/complex (not useful when
these are mixed IMHO)

And some attributes for tables and for numbers... 

(*shiver* OK, this needs to be fixed in the spec...)

It makes sense (well, sort of...) for representing text visually (one could
call it a "font-driven" approach"- but erhhh suboptimal from a content
point of view.


Best regards

Bart

Received on Saturday, 14 August 2010 09:57:59 UTC