Re: Bidi controls vs markup revamp

As per span vs marks, both can be used, and there are pros and cons either
way. The marks basically just act like you had an invisible Arabic or
English character at that position, which pushes neutrals in the right
direction (many know this here, but just for those who don't). So for cases
where we are concatenating text with separators, it works fine to use them
to make sure that the separators have the right directionality. See the
working draft of TR#9 for examples.

Cut and paste of text and extracting the plaintext also work better with
marks, since no program that I know of converts back and forth between
direction spans and embedding characters when going from HTML to plaintext
(or to formatted text without the equivalent of directional spans). A
further issue we've hit is that when extracting text (eg for displaying
snippets on a search page) in order to deal with spans you have to have the
context of the entire document (the dir="ltr") in order to change the
extracted text to have the right spans around it. Pasting text also means
deciding whether spans in the pasted text are to be embedded, compared to
the surrounding text, or merged. All of this is doable, but complicated
enough to be screwed up by programmers...

If you're really embedding text, such as a quotation or segment of text,
then spans are definitely the way to go. They can also be used to deal with
separators, by embedding everything except the separators. I myself find the
marks just as simple or simpler in that case, and they do have the
advantages that a copy and paste preserves the text ordering, and that their
effects are local.

Mark

On Nov 27, 2007 11:22 PM, Martin Duerst <duerst@it.aoyama.ac.jp> wrote:

>
> At 17:42 07/11/27, Richard Ishida wrote:
> >
> >Editorial changes made.
> >
> >> I also think that the term 'paired' is a bit of a problem,
> >> because it is new and it doesn't explain the problem, and
> >> even &lrm; or so could be paired in some way, e.g. as in &lrm;,&lrm;.
> >
> >When would you ever need to have &lrm;,&lrm; ?
>
> It's one way to unambiguously give some weak character some
> strong directionality. In many specific cases, it may not be
> the simplest way to do things, but we should not forget that:
>
> 1) Documents may be edited. A single &lrm; may work wonders in
>   a specific case, but spanning markup may be much more
>   stable under edits.
>
> 2) In many contexts, bidi markup (or control characters) has
>   to be designed to be useful in varied contexts, e.g. for
>   documents created by scripts.
>
> Regards,   Martin.
>
> #-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
> #-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp
>
>
>


-- 
Mark

Received on Wednesday, 28 November 2007 16:47:18 UTC