- From: Jony Rosenne <rosennej@qsm.co.il>
- Date: Tue, 2 Aug 2005 20:42:47 +0200
- To: "'WWW International'" <www-international@w3.org>
1) For block-level elements, Unicode control characters do not provide an answer. The Unicode bidi algorithm specifies a work around when a higher level protocol does not provide the base directionality, and this work around is quite often unsatisfactory. 2) For inline elements, such as your example, markup is equivalent and it's a matter of taste. See the note at the end of 8.2.3. I prefer markup because it is visible. Jony > -----Original Message----- > From: www-international-request@w3.org > [mailto:www-international-request@w3.org] On Behalf Of Tex Texin > Sent: Tuesday, August 02, 2005 1:53 PM > To: WWW International > Subject: Bidi Markup vs Unicode control characters > > > > This has been bothering me for a while and I would like to > see if anyone > has a better answer as to why we recommend markup over bid controls. > > The recommendation in HTML 4.0, and also in the joint > recommendation of > the W3C and Unicode Consortium on Unicode in XML and Markup > Languages is > that the bidirectional markup is to be preferred over the Unicode > control characters. > > The argument is made in > http://www.w3.org/TR/REC-html40/struct/dirlang.html#h-8.2 > and referenced by http://www.unicode.org/reports/tr20/ (section 2+ and > 3.1). > > In particular, HTML 4.0 says: > ============= > Although Unicode specifies special characters that deal with text > direction, HTML offers higher-level markup constructs that do the same > thing: the dir attribute (do not confuse with the DIR element) and the > BDO element. Thus, to express a Hebrew quotation, it is more intuitive > to write > > <Q lang="he" dir="rtl">...a Hebrew quotation...</Q> > > than the equivalent with Unicode references: > > ‫״...a Hebrew quotation...״‬ > ================= > > Now several years ago, I agreed with this. However, several years ago, > most editors had not implemented the Unicode bidi algorithm > and did not > display bidirectional plain text properly. So we were (at least I was) > doing a lot of hand editing and it was not WYSIWYG. > > Today the situation is very different. Many editors implement and > support the Unicode bidirectional algorithm and the associated control > codes. > > So today, we are not faced with markup vs. NCRs. In fact, as I edit > Hebrew or Arabic text, I now prefer to use the Unicode control codes, > because then the plain text is WYSIWYG and I can see how the > result will > appear. > If I instead use markup controls, when I look at the source of my HTML > or XML, it is not WYSIWYG and very difficult to make appropriate > bidirectional edits. > > Using markup instead of the control characters, expands the > size of the > file. > > It also now seems to run against the grain of our other I18n > recommendations, for example to use character encodings that > support all > of the characters used in Web documents or applications, so that NCRs > are not needed and to enhance readability. > > Many Web pages are not static and are composed from dynamic elements > including databases, localization systems and templates, etc. The > components of these systems are often used in multiple ways, sometimes > with markup and sometimes with plaintext and other environments. This > necessitates careful policies and extra conversions between character > and markup choices to satisfy the recommendation. > > In all of these situations, it makes more sense to me to use > the Unicode > bidi control codes, and not use markup. > For other kinds of controls, where markup offers additional > capabilities, readability, etc. > the recommendations of TR20 makes sense. > But for bidi, there is one to one equivalency, and no advantage, and > even some disadvantages (size, loss of WYSIWYG source, etc.) > > There are also of course many places in HTML where you would > like to use > bidirectional text, but cannot use markup and are forced to > use control > codes. (e.g. attributes) > > I therefore cannot support the recommendation to favor markup over the > bidi control characters. It is frankly more expedient to use > the control > codes, have a single approach for text whether it is in markup, > attributes, plaintext, etc. and smaller files, and less processing > (conversions to markup and parsing of markup). > > I would like to hear arguments to the contrary. Actually, I would like > to suggest we consider changing the recommendations so that the bidi > control codes are favored. It seems to me they are so inherent to text > processing that they probably belong as control characters and not in > markup at all. (Now that text processors support them.) > > How do other folks working with bidi, see this? > > tex > > > >
Received on Tuesday, 2 August 2005 17:43:59 UTC