W3C home > Mailing lists > Public > www-international@w3.org > October to December 2006

Re: Arabic XML question

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Sun, 22 Oct 2006 16:54:12 +0900
Message-Id: <6.0.0.20.2.20061022163814.0af20a50@localhost>
To: "Sandra Bostian" <sbos@loc.gov>, <www-international@w3.org>

Hello Sandra,

At 04:41 06/10/21, Sandra Bostian wrote:
>
>I'm working on some training materials and I have a question about Arabic 
>usage in XML elements and the order of tags in a bidi environment. 
>Normally, in an LTR environment you would get this:
>
><name>content</name>
>
>I'm assuming the order of start and end tags would remain the same in a 
>bidi environment, with both Arabic language content and element names, 
>because these are processor rules and they are expecting a particular 
>syntax.

For the order of the data in the document/file/backing store,
that is true. It absolutely has to be in logical order, otherwise
an XML processor would get the end tag before the start tag.

>But I couldn't find anything confirming or disputing this.

The XML Recommendation definitely defines the logical order
of all the markup. Also, because Unicode is used, this automatically
implies logical order of actual text, such as element names and
element content.

However, the XML Recommendation does not at all say how XML
containing RTL (right-to-left) characters should be displayed.
It doesn't even prescribe how XML containing LTR (left-to-right)
characters should be displayed, although it contains some examples.

>Can 
>anyone confirm or point me to something that would say that this should not 
>be the way things are:
>
></eman>tnetnoc<eman> or <eman/>tnetnoc<eman>
>
>and that it should be:
>
><eman>tnetnoc</eman>

All three ways of presenting things that you propose here make
sense to some extent. The basic problems are twofold:
1) It's very easy to use a plain text editor for editing LTR-only
   XML. But using a plain text editor for editing XML that contains
   RTL characters can lead to very strange behavior (none of the
   three choices above in some cases)
2) To some extent, displaying XML with RTL characters leaves some
   choices. The optimal way of displaying may depend on the
   percentage of RTL (vs. LTR) characters for content as well as
   for markup. It also may depend on how familiar the viewer is
   with LTR XML. As an example, I can immagine that your second
   proposal (<eman/>tnetnoc<eman>) would work well for a text that
   is (almost) completely (both content and markup) RTL, and for
   somebody who isn't used to LTR XML. On the other hand, the
   third proposal (<eman>tnetnoc</eman>) may work well for cases
   with just a bit of RTL, and for people who are familiar with
   LTR XML.

For more information, please see our IUC28 paper at:
http://www.sw.it.aoyama.ac.jp/2005/pub/IUC28-bidi/IUC28.html
as well as some additional material (unfortunately somewhat
incomplete) at
http://www.sw.it.aoyama.ac.jp/2005/pub/IUC28-bidi/.

Please also feel free to contact me personally for further
questions or help.

Regards,     Martin.


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     
Received on Monday, 23 October 2006 06:33:45 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:08 GMT