- From: Martin Duerst <duerst@it.aoyama.ac.jp>
- Date: Sun, 22 Oct 2006 16:54:12 +0900
- To: "Sandra Bostian" <sbos@loc.gov>, <www-international@w3.org>
Hello Sandra, At 04:41 06/10/21, Sandra Bostian wrote: > >I'm working on some training materials and I have a question about Arabic >usage in XML elements and the order of tags in a bidi environment. >Normally, in an LTR environment you would get this: > ><name>content</name> > >I'm assuming the order of start and end tags would remain the same in a >bidi environment, with both Arabic language content and element names, >because these are processor rules and they are expecting a particular >syntax. For the order of the data in the document/file/backing store, that is true. It absolutely has to be in logical order, otherwise an XML processor would get the end tag before the start tag. >But I couldn't find anything confirming or disputing this. The XML Recommendation definitely defines the logical order of all the markup. Also, because Unicode is used, this automatically implies logical order of actual text, such as element names and element content. However, the XML Recommendation does not at all say how XML containing RTL (right-to-left) characters should be displayed. It doesn't even prescribe how XML containing LTR (left-to-right) characters should be displayed, although it contains some examples. >Can >anyone confirm or point me to something that would say that this should not >be the way things are: > ></eman>tnetnoc<eman> or <eman/>tnetnoc<eman> > >and that it should be: > ><eman>tnetnoc</eman> All three ways of presenting things that you propose here make sense to some extent. The basic problems are twofold: 1) It's very easy to use a plain text editor for editing LTR-only XML. But using a plain text editor for editing XML that contains RTL characters can lead to very strange behavior (none of the three choices above in some cases) 2) To some extent, displaying XML with RTL characters leaves some choices. The optimal way of displaying may depend on the percentage of RTL (vs. LTR) characters for content as well as for markup. It also may depend on how familiar the viewer is with LTR XML. As an example, I can immagine that your second proposal (<eman/>tnetnoc<eman>) would work well for a text that is (almost) completely (both content and markup) RTL, and for somebody who isn't used to LTR XML. On the other hand, the third proposal (<eman>tnetnoc</eman>) may work well for cases with just a bit of RTL, and for people who are familiar with LTR XML. For more information, please see our IUC28 paper at: http://www.sw.it.aoyama.ac.jp/2005/pub/IUC28-bidi/IUC28.html as well as some additional material (unfortunately somewhat incomplete) at http://www.sw.it.aoyama.ac.jp/2005/pub/IUC28-bidi/. Please also feel free to contact me personally for further questions or help. Regards, Martin. #-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University #-#-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp
Received on Monday, 23 October 2006 06:33:45 UTC