W3C home > Mailing lists > Public > public-i18n-core@w3.org > October to December 2007

Re: Bidi controls vs markup revamp

From: Mark Davis <mark.davis@icu-project.org>
Date: Wed, 28 Nov 2007 08:47:08 -0800
Message-ID: <30b660a20711280847p1e7f0bdte0fdfa29f2c8e2ad@mail.gmail.com>
To: "Martin Duerst" <duerst@it.aoyama.ac.jp>
Cc: "Richard Ishida" <ishida@w3.org>, public-i18n-core@w3.org
As per span vs marks, both can be used, and there are pros and cons either
way. The marks basically just act like you had an invisible Arabic or
English character at that position, which pushes neutrals in the right
direction (many know this here, but just for those who don't). So for cases
where we are concatenating text with separators, it works fine to use them
to make sure that the separators have the right directionality. See the
working draft of TR#9 for examples.

Cut and paste of text and extracting the plaintext also work better with
marks, since no program that I know of converts back and forth between
direction spans and embedding characters when going from HTML to plaintext
(or to formatted text without the equivalent of directional spans). A
further issue we've hit is that when extracting text (eg for displaying
snippets on a search page) in order to deal with spans you have to have the
context of the entire document (the dir="ltr") in order to change the
extracted text to have the right spans around it. Pasting text also means
deciding whether spans in the pasted text are to be embedded, compared to
the surrounding text, or merged. All of this is doable, but complicated
enough to be screwed up by programmers...

If you're really embedding text, such as a quotation or segment of text,
then spans are definitely the way to go. They can also be used to deal with
separators, by embedding everything except the separators. I myself find the
marks just as simple or simpler in that case, and they do have the
advantages that a copy and paste preserves the text ordering, and that their
effects are local.

Mark

On Nov 27, 2007 11:22 PM, Martin Duerst <duerst@it.aoyama.ac.jp> wrote:

>
> At 17:42 07/11/27, Richard Ishida wrote:
> >
> >Editorial changes made.
> >
> >> I also think that the term 'paired' is a bit of a problem,
> >> because it is new and it doesn't explain the problem, and
> >> even &lrm; or so could be paired in some way, e.g. as in &lrm;,&lrm;.
> >
> >When would you ever need to have &lrm;,&lrm; ?
>
> It's one way to unambiguously give some weak character some
> strong directionality. In many specific cases, it may not be
> the simplest way to do things, but we should not forget that:
>
> 1) Documents may be edited. A single &lrm; may work wonders in
>   a specific case, but spanning markup may be much more
>   stable under edits.
>
> 2) In many contexts, bidi markup (or control characters) has
>   to be designed to be useful in varied contexts, e.g. for
>   documents created by scripts.
>
> Regards,   Martin.
>
> #-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
> #-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp
>
>
>


-- 
Mark
Received on Wednesday, 28 November 2007 16:47:18 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 1 October 2008 10:18:53 GMT