- From: Richard Ishida <ishida@w3.org>
- Date: Thu, 2 Aug 2007 13:16:09 +0100
- To: "'Robert Burns'" <rob@robburns.com>, "'Sander Tekelenburg'" <st@isoc.nl>
- Cc: <public-html@w3.org>
> From: public-html-request@w3.org > [mailto:public-html-request@w3.org] On Behalf Of Robert Burns > Sent: 01 August 2007 07:18 > That is a good example. However, the RFC 3066 language codes > allow one to specify both language and different script Note in passing that RFC 3066 didn't allow this, and it is now an obsolete specification. It was replaced by RFC 4646, which does allow for scripts to be specified, though only when absolutely necessary to distinguish usage, not as a matter of course. (See http://www.w3.org/International/articles/language-tags/ for more details.) > variants. So Hebrew written with the Latin script could be > designated by lang='iw- LATN' (dir='LTR'); standard Hebrew as > lang='iw' (dir='RTL'); Turkish as lang='tr-LATN'; and > Turikish in Arabic as lang='tr-Arab' (dir='RTL'). Note in passing that iw is an obsolete code for Hebrew, you should now use 'he'. See http://people.w3.org/rishida/utils/subtags/index.php?searchtext=hebrew&submi t=Search&searchtype=2 > > With these RFC 3066 language codes everything necessary to > designate directionality is already there. I think the reason > we have both @dir and @lang is so that authors have more > flexibility in how much language detail to provide. Also UAs > do not have to hard-wire the mappings of about RFC 3066 > scripts codes to directionality and extract script > information from the language codes. That's just my > speculation on this but perhaps someone else knows more of > the history behind this. See http://www.w3.org/TR/i18n-html-tech-lang/#ri20050208.093646470 In fact, the use of these two attributes doesn't always coincide. In a document that is generally in English you may have a small table that contains only Hebrew or Arabic text. Although it would make sense to use @lang once on the <table> element, so that it signifies that all the text in the table is in a given language and you don't have to repeat it, you would probably *not* want the table columns to flow from right to left (as would usually be the case when using dir="rtl" on the table), since this is an English document. If xml:lang was associated with direction, you would probably have no control over that. Same goes for list items. Basically, the two attributes do different jobs. Better reduce confusion and scope for error by having simple, clear semantics to the attributes. It is also perfectly acceptable for people to have been labelling legacy Azerbaijani content as 'az' until now, and to continue to do so in the future, but that carries no information about whether they used the cyrillic (LTR) or arabic (RTL) script, since Azerbaijani uses both. An IPA (International Phonetic Alphabet) transcription of Hebrew could well be marked as 'he', but it would be incorrect to assume that the directionality was RTL. Hope that helps, RI PS: Note also that @dir used in DITA, XHTML2, ITS, etc has additional values of lro (left-right-override) and rlo (right-left-override), which cannot be expressed by @lang. In fact we could consider making that the case for HTML5, and deprecating the <bdo> tag, though that is a separate thread. If we did, then it would be clearer that @dir has a different role than @lang. ============ Richard Ishida Internationalization Lead W3C (World Wide Web Consortium) http://www.w3.org/People/Ishida/ http://www.w3.org/International/ http://people.w3.org/rishida/blog/ http://www.flickr.com/photos/ishida/ > -----Original Message----- > To: Sander Tekelenburg > Cc: public-html@w3.org > Subject: Re: authoring @lang and @dir (was 3.6. The root element) > > > > On Jul 31, 2007, at 2:33 AM, Sander Tekelenburg wrote: > > >>> (Given how many 'rtl natives' also speak english, french, etc. I > >>> suspect the same, although perhaps somehwat less > widespread, applies > >>> to @dir.) > >> > >> There I think the direction is very dependent on the language. > >> [...] Once @lang is there, @dir can be computed accordingly. > > > > I don't think so. I don't speak most languages :) but for sure it > > isn't as clear cut as a specific language equaling a specific > > direction. (Or else why the need for @dir at all?) @dir is about > > scripts, not languages. > > One language > > can be expressed in different scripts, so can have different > > directionalities. > > > > I've no idea how widespread that practice actually is, but for > > instance romanization of hebrew appears to be rather common: > > <http://en.wikipedia.org/wiki/Romanization_of_Hebrew#Modern_uses>. > > Notably > > "Some Hebrew speakers use romanization to communicate when using > > Internet systems that have poor support for the Hebrew alphabet." > > suggests that at least romanization is likely used in many other > > languages as well. > > (But > > always take Wikipedia with a grain of salt.) > > > Take care, > Rob >
Received on Thursday, 2 August 2007 12:14:31 UTC