W3C home > Mailing lists > Public > www-international@w3.org > January to March 2013

Re: Proposal for new direction attribute

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Thu, 21 Feb 2013 18:44:32 +0100
To: Aharon (Vladimir) Lanin <aharon@google.com>
Cc: Andrew Cunningham <acunningham@slv.vic.gov.au>, "public-i18n-bidi@w3.org" <public-i18n-bidi@w3.org>, "Phillips, Addison" <addison@lab126.com>, Richard Ishida <ishida@w3.org>, www International <www-international@w3.org>, "Amir E. Aharoni" <amir.aharoni@mail.huji.ac.il>
Message-ID: <20130221184432317412.eb8ec49f@xn--mlform-iua.no>
The direction I propose could work also when the language is unknown. 
Amir in his bug report proposed that the new behavior should be 
dependent on the DOCTYPE. Such a thing would not be acceptable as it 
would be to introduce versioning in HTML, which is not acceptable per 
the direction HTML5 has taken.

But it would be possible to introduce "versioning" in the very 
direction feature. That is what CSS 3 has done with the 
unicode-bidi:embed vs unicode-bidi:isolate option. Such "versioning" - 
or behavior changer - is present for several features of CSS. For 
instance for the table-layout property.

I proposed to use dir="auto" or, probably better, dir="isolate" as a 
trigger for the new isolate behavior. But it would be possible to 
introduce a new attribute to control the new behavior. Why not call it 
unicode-bidi="*". The unicode-bidi attribute could take the values 
<empty-string>, embed and isolate. The empty string would be equal to 
unicode-bidi="isolate". Thus authors could just add unicode-bidi="" to 
get the new behavior.

Perhaps in the future, let’s say in HTML6, authors would not need to 
use unicode-bidi="" in order to get isolate behavior anymore, in which 
case unicode-bidi="" can be made optional.

It seems to me to be much better to introduce a new attribute with 
versioning like behavior, than it is to to introduce a new attribute 
that only is a longer name for the attribute we already have. It would, 
IMHO, also be very untypical of the HTML Working Group to do such a 

Leif Halvard Silli

Aharon (Vladimir) Lanin, Thu, 21 Feb 2013 15:54:26 +0200:
> Can we please take it as a given that the language of a piece of data is
> very often unknown to the web application. I know that this is not the case
> at Wikipedia, but generally web apps have to deal with things like user
> generated content, which can be in an arbitrary language and can (and does)
> mix languages, and with information obtained from various databases, which
> do not always declare the language of their data. In addition, even when
> the language of text data is ultimately available at the data source,
> applications are rarely written a-priori to pass that information around
> together with each piece of text data, and modifying an existing app to do
> so is usually a daunting task that rarely makes it to the list of
> priorities. Thus, we must have a solution for isolates that works in the
> absence of the lang attribute. This is what the proposal is about. Once we
> decide how that should be done, we can try to address the lang issue.
> Aharon
> On Thu, Feb 21, 2013 at 3:53 AM, Leif Halvard Silli wrote:
>> Hi Andrew!
>> There might be a risk that small languages will be “forgotten“. But if
>> users see that it works for “big” languages, then they will expect that
>> it works for the small languages as well.
>> On the other side, this would not be the first time that Web developers
>> have to take into consideration that a certain new feature might not
>> yet be supported across all browsers.
>> The main benefit here is to introduce the new isolation behavior. Thus
>> by first doing <html dir="auto">, you can, if you want, continue to tag
>> languages like you do today. So you could e.g. do this:
>> <html lang="he" dir="auto"><body dir="rtl">Lorem.</body></html>
>> This would work both in legacy browsers and in updated browsers.
>> Alternatively, instead of "auto", one could introduce a new value, for
>> example "isolate":
>> <html lang="he" dir="isolate"><body dir="rtl">Lorem.</body></html>
>> It seems to me that the method I propose here, has not been considered
>> by the those who worked out the proposal.[1]
>> [1]

>> Leif H. Silli
>> Andrew Cunningham, Thu, 21 Feb 2013 09:58:56 +1100:
>>> Hi, As far as i can tell there are two key issues here:
>>> 1) when developing a multilingual web service, a developer would need to
>>> inject additional mark up when the language of content changes between a
>>> supported and unsupported languages. Which can be done but places an
>> undue
>>> burden on web devopers unless
>>> 2) all web browser developers document which languages are supported in
>>> this way.
>>> And honestly up to date documentation for this is unlikely. Currently the
>>> only way to determine when browsers support specific behavopurs based on
>>> language tagging is to examine the browser source code where available.
>>> On Feb 21, 2013 7:49 AM, "Leif Halvard Silli" wrote:
>>>> The way I propose to link direction to @lang,[1] then, to get the new
>>>> isolation behavior for unsupported languages, one could do this:
>>>> <div lang="unsupported-language" dir="auto">
>>>>     <div dir="rtl">Lorem Ipsum</div>
>>>> </div>
>>>> [1]
>>>> http://www.w3.org/mid/20130220214342391704.d3a0addd@xn--mlform-iua.no

>>>> Leif H Silli
>>>> Andrew Cunningham, Thu, 21 Feb 2013 07:25:20 +1100:
>>>>> Hi Amir,
>>>>> In theory basing it on language sounds good, but I doubt it would be
>>>>> practical. I suspect that even if browser developers implemented it,
>> that
>>>>> it would only cover a small subset of languages. And could damage
>>>> minority
>>>>> languages, ie. set the direction incorrectly for minority languages.
>>>>> Additionally a number of languages have orthographies using different
>>>>> scripts and require different directions being set.
>>>>> In theory this could be covered by language tagging being accurate and
>>>>> including script codes where necessary. But ...
>>>>> Personally, as a developer working with multiple languages, I prefer to
>>>>> have full control of languages, their typography, text direction and
>>>> other
>>>>> aspecta.
>>>>> On Feb 21, 2013 5:37 AM, "Amir E. Aharoni" <
>> amir.aharoni@mail.huji.ac.il
>>>>> wrote:
>>>>>> i2013/2/20 Phillips, Addison <addison@lab126.com>:
>>>>>>> Hello Amir,
>>>>>>> In my opinion, using the @lang attribute to set direction is a bad
>>>> idea.
>>>>>> The
>>>>>>> language tag is not an explicit indicator of the direction of
>> content.
>>>> It
>>>>>>> may, of course, imply the direction. But it is a poor indicator
>>>> compared
>>>>>> to
>>>>>>> either explicit direction or to first strong (auto direction).
>>>>>> Contrariwise: first-strong is just a poor heuristic when no other
>>>>>> information about direction is available.
>>>>>> dir="rtl/ltr" is what's used in practice today, of course, and it's
>>>>>> OK, but how is it used? How does the developer decide that something
>>>>>> should be ltr or rtl? According to the language, of course. At least
>>>>>> that's what happens in major CMSs, like WordPress and MediaWiki. I am
>>>>>> a developer of the latter; it applies dir server-side (and sometimes
>>>>>> client-side) according the language whenever it is known. We currently
>>>>>> maintain our list of languages, with a direction specified for each
>>>>>> language, and we are gradually moving to using the CLDR for providing
>>>>>> information about the writing system, and hence the direction, of each
>>>>>> language. I cannot imagine web developers doing anything else. And
>>>>>> since that's what's happening in practice, it should be done by the
>>>>>> browser.
>>>>>> There are edge cases, the most famous examples being Punjabi and
>>>>>> Azeri, but as I explain in
>>>>>> https://www.w3.org/Bugs/Public/show_bug.cgi?id=19888 , using correct
>>>>>> language codes solves this problem. Developers should use a correct
>>>>>> lang attribute anyway. This also means that "few people use the lang
>>>>>> attribute" is a weak argument.
>>>>>> What I am proposing is to apply a *default* direction according to the
>>>>>> specified language, and to make it possible to override with an
>>>>>> explicit dir (or direction) attribute.
>>>>>> --
>>>>>> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
>>>>>> http://aharoni.wordpress.com

>>>>>> ‪“We're living in pieces,
>>>>>> I want to live in peace.” – T. Moore‬
>>>>>>> Having @lang start a new isolate might be worthwhile, though, since
>> one
>>>>>>> language embedded in another might very well have different
>> directional
>>>>>>> characteristics and there is no reason to require users to input both
>>>>>>> attributes if the content does not inherently require more complex
>>>>>> markup.
>>>>>>> Addison Phillips
>>>>>>> Globalization Architect (Lab126)
>>>>>>> Chair (W3C I18N WG)
>>>>>>> Sent from my Kindle Fire HD
>>>>>>> "Amir E. Aharoni" <amir.aharoni@mail.huji.ac.il> wrote:
>>>>>>> The direction/dir transition plan is nice.
>>>>>>> It's a bit disappointing, though, that neither of the following
>>>>>>> suggestions was considered:
>>>>>>> 1. Make any element with an explicit lang or dir attribute
>>>>>>> bidi-isolated by default
>>>>>>> https://www.w3.org/Bugs/Public/show_bug.cgi?id=18490

>>>>>>> 2. Apply the direction according to language
>>>>>>> https://www.w3.org/Bugs/Public/show_bug.cgi?id=19888

>>>>>>> Is there, maybe, a plan to consider this in the future?
>>>>>>> --
>>>>>>> Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
>>>>>>> http://aharoni.wordpress.com

>>>>>>> ‪“We're living in pieces,
>>>>>>> I want to live in peace.” – T. Moore‬
>>>>>>> 2013/2/20 Richard Ishida <ishida@w3.org>:
>>>>>>>> Unicode 6.3 will shortly be released, and will contain new control
>>>> codes
>>>>>>>> (RLI, LRI, FSI, PDI) to enable authors to express isolation at the
>>>> same
>>>>>>>> time
>>>>>>>> as direction in inline bidirectional text. The Unicode Consortium
>>>>>>>> recommends
>>>>>>>> that isolation be used as the default for all future inline
>>>>>> bidirectional
>>>>>>>> text embeddings.
>>>>>>>> The i18n WG has been discussing how to ensure that HTML5 encourages
>>>> and
>>>>>>>> enables content authors to adopt and apply isolation *as the
>> default*
>>>>>>>> whenever they set direction on inline content, and discourage future
>>>> use
>>>>>>>> of
>>>>>>>> dir=rtl or dir=ltr (which does not produce isolation).
>>>>>>>> The proposal of the WG, with rationales, can be found at
>>>>>>>> http://www.w3.org/International/wiki/Html-bidi-isolation

>>>>>>>> i18n WG folks, please let me know asap if you think this needs
>>>> changing
>>>>>> in
>>>>>>>> some way.
>>>>>>>> RI
>>>>>>>> --
>>>>>>>> Richard Ishida
>>>>>>>> W3C
>>>>>>>> http://rishida.net/

Received on Thursday, 21 February 2013 17:45:09 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:34 UTC