Re: [iri] #118: What term to use for the kind of text that the Unicode Bidi Algorithm was designed for

Hello Mati,

Many thanks for your comments.

On 2012/03/11 22:07, Matitiahu Allouche wrote:
> Since the question is related to Unicode (the kind of text that the
> Unicode Bidi Algorithm was designed for), maybe we should check the
> Unicode definition for "plain text". In the Unicode glossary (
> http://unicode.org/glossary/#P), we find:
> Plain Text. Computer-encoded text that consists only of a sequence of code
> points from a given standard, with no other formatting or structural
> information. Plain text interchange is commonly used between computer
> systems that do not share higher-level protocols. (See also rich text.)
>
>
> Personally, I find this definition appropriate for "the kind of text that
> the Unicode Bidi Algorithm was designed for", and I prefer "plain text"
> over "running text". It is also my experience that "plain text" is much
> more in use in Unicode circles than "running text".

I agree that if we look at the distinction between plain text and rich 
text, then it is appropriate to say that the Bidi Algorithm has been 
designed for plain text rather than for rich text. But in the two places 
in the spec where we have been using "running text" for the past seven 
or more years, it's NOT this distinction between plain text and rich 
text that we are after.

To be more specific, it's irrelevant whether an IRI shows up in a plain 
text file (.txt) or a rich text file (e.g. MS Word, HTML with 
stylesheets,...). We have exactly the same problems with bidi IRIs in 
plain text as we have in rich text. This is because although the Bidi 
Algorithm was designed for plain text, essentially the same algorithm is 
used for rich text. For MS Word, there are usually a few tweaks where it 
does not behave exactly the same as the Unicode Bidi Algorithm (the last 
one of them is the special behavior regarding parentheses that was 
presented and discussed at last year's IUC), but the basics are the 
same. Rendered HTML also uses the Unicode Bidi Algorithm for its basic 
features.

What the spec is referring to is the fact that the Bidi Algorithm was 
designed for sequences of characters, words, and punctuation such as 
they turn up in letters, newspaper articles, explanatory text in books, 
and so on, as opposed to sequences of characters as they turn up in 
artificial stuff such as IRIs, markup source, programming languages, and 
so on.

I'm not sure whether "running text" is the best term for this, but I am 
very sure "plain text" is wrong for where we want to use it, because 
IRIs, markup source, programs, and so on are in many if not most cases 
plain text. Running text at least seems to come close, see e.g. the 
definition at http://en.wiktionary.org/wiki/running_text.

Regards,   Martin.



> Shalom (Regards),  Mati
>         Bidi Architect
>         Globalization Center Of Competency - Bidirectional Scripts
>         IBM Israel
>         Mobile: +972 52 2554160
>
>
>
>
> From:   "iri issue tracker"<trac+iri@trac.tools.ietf.org>
> To:     draft-ietf-iri-3987bis@tools.ietf.org, duerst@it.aoyama.ac.jp
> Cc:     public-iri@w3.org
> Date:   11/03/2012 14:03
> Subject:        [iri] #118: What term to use for the kind of text that the
> Unicode  Bidi Algorithm was designed for
>
>
>
> #118: What term to use for the kind of text that the Unicode Bidi
> Algorithm was
> designed for
>
>   What term should we use for the kind of text that the Unicode Bidi
>   Algorithm was designed for. RFC 3987 and 3987bis use "running text".
> bidi-
>   guidelines (-01) changed to "plain text".
>
>   We have a definition for running text at
>   http://tools.ietf.org/html/draft-ietf-iri-3987bis-10#section-1.3:
>
>       running text:  Human text (paragraphs, sentences, phrases) with
>          syntax according to orthographic conventions of a natural
>          language, as opposed to syntax defined for ease of processing by
>          machines (e.g., markup, programming languages).
>
>   In RFC 3987, there are two uses:
>
>   The Unicode Bidirectional Algorithm is designed mainly for running text.
>
>   [UNIXML] is written in the context of running text rather than in that of
>   identifiers.
>
>   The first use moved to bidi-guidelines, but the second use is still in
>   3987bis. In both cases, the term "plain text" isn't appropriate, because
>   the main use of "plain text" is to distinguish from "fancy text", i.e.
>   text with styling,... But in both usages above, the distinction between
>   "plain text" and "fancy text" is irrelevant. See also
>   http://en.wikipedia.org/wiki/Plain_text.
>

Received on Monday, 12 March 2012 03:06:28 UTC