W3C home > Mailing lists > Public > public-html@w3.org > August 2007

RE: authoring @lang and @dir (was 3.6. The root element)

From: Richard Ishida <ishida@w3.org>
Date: Thu, 2 Aug 2007 13:16:09 +0100
To: "'Robert Burns'" <rob@robburns.com>, "'Sander Tekelenburg'" <st@isoc.nl>
Cc: <public-html@w3.org>
Message-ID: <094901c7d4fe$e92adab0$6401a8c0@rishida>

> From: public-html-request@w3.org 
> [mailto:public-html-request@w3.org] On Behalf Of Robert Burns
> Sent: 01 August 2007 07:18

> That is a good example. However, the RFC 3066 language codes 
> allow one to specify both language and different script 

Note in passing that RFC 3066 didn't allow this, and it is now an obsolete
specification. It was replaced by RFC 4646, which does allow for scripts to
be specified, though only when absolutely necessary to distinguish usage,
not as a matter of course.
(See http://www.w3.org/International/articles/language-tags/ for more
details.)


> variants. So Hebrew written with the Latin script could be 
> designated by lang='iw- LATN' (dir='LTR'); standard Hebrew as 
> lang='iw' (dir='RTL'); Turkish as lang='tr-LATN'; and 
> Turikish in Arabic as lang='tr-Arab' (dir='RTL').

Note in passing that iw is an obsolete code for Hebrew, you should now use
'he'. See
http://people.w3.org/rishida/utils/subtags/index.php?searchtext=hebrew&submi
t=Search&searchtype=2

> 
> With these RFC 3066 language codes everything necessary to 
> designate directionality is already there. I think the reason 
> we have both @dir and @lang is so that authors have more 
> flexibility in how much language detail to provide. Also UAs 
> do not have to hard-wire the mappings of about RFC 3066 
> scripts codes to directionality and extract script 
> information from the language codes. That's just my 
> speculation on this but perhaps someone else knows more of 
> the history behind this.

See http://www.w3.org/TR/i18n-html-tech-lang/#ri20050208.093646470

In fact, the use of these two attributes doesn't always coincide.

In a document that is generally in English you may have a small table that
contains only Hebrew or Arabic text.  Although it would make sense to use
@lang once on the <table> element, so that it signifies that all the text in
the table is in a given language and you don't have to repeat it, you would
probably *not* want the table columns to flow from right to left (as would
usually be the case when using dir="rtl" on the table), since this is an
English document. If xml:lang was associated with direction, you would
probably have no control over that.  Same goes for list items.

Basically, the two attributes do different jobs. Better reduce confusion and
scope for error by having simple, clear semantics to the attributes.

It is also perfectly acceptable for people to have been labelling legacy
Azerbaijani content as 'az' until now, and to continue to do so in the
future, but that carries no information about whether they used the cyrillic
(LTR) or arabic (RTL) script, since Azerbaijani uses both.

An IPA (International Phonetic Alphabet) transcription of Hebrew could well
be marked as 'he', but it would be incorrect to assume that the
directionality was RTL.

Hope that helps,
RI


PS: Note also that @dir used in DITA, XHTML2, ITS, etc has additional values
of lro (left-right-override) and rlo (right-left-override), which cannot be
expressed by @lang. In fact we could consider making that the case for
HTML5, and deprecating the <bdo> tag, though that is a separate thread.  If
we did, then it would be clearer that @dir has a different role than @lang. 


============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)
 
http://www.w3.org/People/Ishida/
http://www.w3.org/International/
http://people.w3.org/rishida/blog/
http://www.flickr.com/photos/ishida/
 
 

> -----Original Message-----
> To: Sander Tekelenburg
> Cc: public-html@w3.org
> Subject: Re: authoring @lang and @dir (was 3.6. The root element)
> 
> 
> 
> On Jul 31, 2007, at 2:33 AM, Sander Tekelenburg wrote:
> 
> >>> (Given how many 'rtl natives' also speak english, french, etc. I 
> >>> suspect the same, although perhaps somehwat less 
> widespread, applies 
> >>> to @dir.)
> >>
> >> There I think the direction is very dependent on the language.
> >> [...] Once @lang is there, @dir can be computed accordingly.
> >
> > I don't think so. I don't speak most languages :) but for sure it 
> > isn't as clear cut as a specific language equaling a specific 
> > direction. (Or else why the need for @dir at all?) @dir is about 
> > scripts, not languages.
> > One language
> > can be expressed in different scripts, so can have different 
> > directionalities.
> >
> > I've no idea how widespread that practice actually is, but for 
> > instance romanization of hebrew appears to be rather common:
> > <http://en.wikipedia.org/wiki/Romanization_of_Hebrew#Modern_uses>.  
> > Notably
> > "Some Hebrew speakers use romanization to communicate when using 
> > Internet systems that have poor support for the Hebrew alphabet." 
> > suggests that at least romanization is likely used in many other 
> > languages as well.
> > (But
> > always take Wikipedia with a grain of salt.)
> 
> 
> Take care,
> Rob
> 
Received on Thursday, 2 August 2007 12:14:31 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:16:03 GMT