- From: Richard Ishida <ishida@w3.org>
- Date: Tue, 2 Aug 2005 16:23:52 +0100
- To: "'Tex Texin'" <tex@xencraft.com>, "'WWW International'" <www-international@w3.org>
A couple of potential points that spring to mind for now until I have time to think this through in more detail: [1] We should clarify that the issue here is for the constructs that set embedding levels - not the RLM and LRM, where the use of Unicode rather than markup is recommended (although these may sometimes be escaped, for visibility). [2] Will people be able to easily manage the scope and presence/absence of such invisible characters easily. [3] How do you manage inheritance elegantly using this approach. There are two issues here: a) whether you can replace, say, <html dir="rtl"...> or <table dir="rlo"...> effectively with Unicode control characters, and b) whether people can tell where these things are easily enough to manage them effectively. RI ============ Richard Ishida W3C contact info: http://www.w3.org/People/Ishida/ W3C Internationalization: http://www.w3.org/International/ Publication blog: http://people.w3.org/rishida/blog/ > -----Original Message----- > From: www-international-request@w3.org > [mailto:www-international-request@w3.org] On Behalf Of Tex Texin > Sent: 02 August 2005 12:53 > To: WWW International > Subject: Bidi Markup vs Unicode control characters > > > This has been bothering me for a while and I would like to > see if anyone has a better answer as to why we recommend > markup over bid controls. > > The recommendation in HTML 4.0, and also in the joint > recommendation of the W3C and Unicode Consortium on Unicode > in XML and Markup Languages is that the bidirectional markup > is to be preferred over the Unicode control characters. > > The argument is made in > http://www.w3.org/TR/REC-html40/struct/dirlang.html#h-8.2 > and referenced by http://www.unicode.org/reports/tr20/ > (section 2+ and 3.1). > > In particular, HTML 4.0 says: > ============= > Although Unicode specifies special characters that deal with > text direction, HTML offers higher-level markup constructs > that do the same > thing: the dir attribute (do not confuse with the DIR > element) and the BDO element. Thus, to express a Hebrew > quotation, it is more intuitive to write > > <Q lang="he" dir="rtl">...a Hebrew quotation...</Q> > > than the equivalent with Unicode references: > > ‫״...a Hebrew quotation...״‬ > ================= > > Now several years ago, I agreed with this. However, several > years ago, most editors had not implemented the Unicode bidi > algorithm and did not display bidirectional plain text > properly. So we were (at least I was) doing a lot of hand > editing and it was not WYSIWYG. > > Today the situation is very different. Many editors implement > and support the Unicode bidirectional algorithm and the > associated control codes. > > So today, we are not faced with markup vs. NCRs. In fact, as > I edit Hebrew or Arabic text, I now prefer to use the Unicode > control codes, because then the plain text is WYSIWYG and I > can see how the result will appear. > If I instead use markup controls, when I look at the source > of my HTML or XML, it is not WYSIWYG and very difficult to > make appropriate bidirectional edits. > > Using markup instead of the control characters, expands the > size of the file. > > It also now seems to run against the grain of our other I18n > recommendations, for example to use character encodings that > support all of the characters used in Web documents or > applications, so that NCRs are not needed and to enhance readability. > > Many Web pages are not static and are composed from dynamic > elements including databases, localization systems and > templates, etc. The components of these systems are often > used in multiple ways, sometimes with markup and sometimes > with plaintext and other environments. This necessitates > careful policies and extra conversions between character and > markup choices to satisfy the recommendation. > > In all of these situations, it makes more sense to me to use > the Unicode bidi control codes, and not use markup. > For other kinds of controls, where markup offers additional > capabilities, readability, etc. > the recommendations of TR20 makes sense. > But for bidi, there is one to one equivalency, and no > advantage, and even some disadvantages (size, loss of WYSIWYG > source, etc.) > > There are also of course many places in HTML where you would > like to use bidirectional text, but cannot use markup and are > forced to use control codes. (e.g. attributes) > > I therefore cannot support the recommendation to favor markup > over the bidi control characters. It is frankly more > expedient to use the control codes, have a single approach > for text whether it is in markup, attributes, plaintext, etc. > and smaller files, and less processing (conversions to markup > and parsing of markup). > > I would like to hear arguments to the contrary. Actually, I > would like to suggest we consider changing the > recommendations so that the bidi control codes are favored. > It seems to me they are so inherent to text processing that > they probably belong as control characters and not in markup > at all. (Now that text processors support them.) > > How do other folks working with bidi, see this? > > tex >
Received on Tuesday, 2 August 2005 15:23:59 UTC