W3C home > Mailing lists > Public > public-i18n-bidi@w3.org > October to December 2011

RE: WebVTT bidi: can we have ‎ and ‏ escapes?

From: Phillips, Addison <addison@lab126.com>
Date: Thu, 8 Dec 2011 09:21:15 -0800
To: Kent Karlsson <kent.karlsson14@telia.com>, "Aharon (Vladimir) Lanin" <aharon@google.com>, "public-texttracks@w3.org" <public-texttracks@w3.org>, "public-i18n-bidi@w3.org" <public-i18n-bidi@w3.org>
Message-ID: <131F80DEA635F044946897AFDA9AC3476AA5BBE7EB@EX-SEA31-D.ant.amazon.com>
Blar! I read the characters incorrectly. The marks are not sequence forming, and Kent is correct.

Addison

Addison Phillips
Globalization Architect (Lab126)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.



From: Kent Karlsson [mailto:kent.karlsson14@telia.com]
Sent: Thursday, December 08, 2011 8:58 AM
To: Phillips, Addison; Aharon (Vladimir) Lanin; public-texttracks@w3.org; public-i18n-bidi@w3.org
Subject: Re: WebVTT bidi: can we have &lrm; and &rlm; escapes?

These marks, no, they are not terminated by anything. They are freestanding.

LRE, LRO, RLE, and RLO are terminated (by PDF), since they do start a "span", but the marks don't.

    /Kent K


Den 2011-12-08 17:48, skrev "Phillips, Addison" <addison@lab126.com>:
You need a third character: U+202C (PDF). Sequences starting with RLM or LRM are terminated using this character. See: http://www.w3.org/International/questions/qa-bidi-controls.en.php


Addison

Addison Phillips
Globalization Architect (Lab126)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.




From: Aharon (Vladimir) Lanin [mailto:aharon@google.com]
Sent: Thursday, December 08, 2011 12:43 AM
To: public-texttracks@w3.org; public-i18n-bidi@w3.org
Subject: WebVTT bidi: can we have &lrm; and &rlm; escapes?


The WebVTT spec currently allows just three escapes: &lt;, &gt;, and &amp;. Authors are expected to enter any other characters directly by whatever other means they have at their disposal.



I would like to suggest that an exception is needed for two more characters, LRM and RLM. These are invisible characters with strong directionality, LTR for one and RTL for the other. These are used in bidi text in two ways:



- At the start of a paragrph, one of these can be used to indicate the paragraph's overall directionality in contexts where the directionality is determined by the paragraph's first character with strong direction. This is the default method of determining paragraph direction specified by the Unicode Bidirectional Algorithm - and the *only* method allowed by the current WebVTT spec. It is important to note that RTL languages fairly often use "words" spelled in LTR characters, e.g. acronyms like GPS and HTML (and WebVTT), as well as brand names. Occasionally, these occur as the first word in a sentence or even a paragraph, and when this is the case, the overall directionality of the paragraph is set incorrectly, unless one puts an RLM at the beginning of the paragraph.



- In bidi text, these characters provide some means of control over the visual ordering of the characters. For example, to get "Mamma Mia!" to come out that way - and not as "!Mamma Mia" - in RTL text, one can put an LRM after the exclamation mark. In HTML, there are other means of such control, such as wrapping opposite-direction phrases in <span dir=...> or in a <bdi> element. But such means are absent in WebVTT.



There are several reasons that I think an exception should be made for these characters and escapes provided for them in WebVTT:



1. As mentioned above, WebVTT does not provide any means for controlling paragraph directionality or inline directionality explicitly. Thus, the author has no means but LRM and RLM for such control in a WebVTT file.



2. LRM and RLM are invisible. Entering invisible characters and editing text that already contains them is confusing.



3. The existing standard Hebrew and Arabic keyboards do not provide a means of generating an actual LRM or RLM. Although the Windows native TextBox control provides a context menu that allows inserting various special characters including LRM and RLM, and Microsoft Notepad uses TextBox and thus provides the same context menu, most reasonable text editors available on Windows (e.g. Notepad++) are not based on TextBox and do not provide a means for generating LRM and RLM. The same, as far as I know, is true for Linux (e.g. gedit).



Aharon
Received on Thursday, 8 December 2011 17:56:55 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:24:39 UTC