W3C home > Mailing lists > Public > www-style@w3.org > July 2011

Re: [css3-writing-modes] bidi-style resolution of punctuation orientation

From: Florian Rivoal <florianr@opera.com>
Date: Tue, 05 Jul 2011 16:17:30 +0900
To: www-style@w3.org
Message-ID: <op.vx4w7gva4p7avi@eeeflorian>
On Fri, 01 Jul 2011 11:32:51 +0900, fantasai
<fantasai.lists@inkedblade.net> wrote:

> This can be done with the HTML lang tag, which can accept script subtags
> from ISO 15924. If a document is tagged as lang="zh-Hant", we know it is  
> written in traditional Chinese, and therefore will have an upright
> base orientation. Similarly if a document is tagged as lang="ja-Jpan",
> we know it is written in a combination of Han, Hiragana, and Katakana,
> and its base orientation is upright.

I agree this is a good approach.

> The question then is, what do we do if the script is not tagged (as it
> almost never will be)? Do we use a heuristic, or default to one  
> orientation or another? If so, which one?

I think we need to define an algorithm for determining what the language
is. We need this here, and there are a fair few places in CSS3-TEXT where
that would come in handy too. It probably needs to be written in one of
the two specs, and referred to from the other.

The algorithm should probably be something like:
1- if you have a lang attribute, use that
2- otherwise, if you have an Content-Language http header, use that
3- otherwise, if you have a <meta http-equiv="content-language" ...> use
    that
4- otherwise, if you have a charset specified in the http headers and that
    charset is specific to a language (shift-jis, BG, big5, EUC-KR... the
    list must be explicit), you're in that language
5- same as 4, but with a meta tag, rather than an http header
6- otherwise, you don't know

Not sure if step 4 and 5 are a good idea though. A variation on 4 and 5
could be that the http headers or meta tags specify one such encoding and
at least one non ascii character is used in the actual content. But that
has performance implications I am not sure I like, and it might not really
help anyway.

Having an exhaustive list of all languages  and their respective
orientation is a lot of work, and I don't think it is necessary. Since I
believe that the languages that want sideways outnumber the languages that
want upright by a lot, we should have a list of all the languages that are
upright, and say that everything else is sideways.

This approach also gives us an answer to what to do when we don't know
the language: sideways. I don't think it really matter which arbitrary
default we pick, as long as browsers are consistent about it, but this
seems like a decent choice.

  - Florian
Received on Tuesday, 5 July 2011 07:17:55 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 17:20:42 GMT