WebVTT bidi: can we have ‎ and ‏ escapes?

The WebVTT spec currently allows just three escapes: <, >, and &.
Authors are expected to enter any other characters directly by whatever
other means they have at their disposal.

I would like to suggest that an exception is needed for two more
characters, LRM and RLM. These are invisible characters with strong
directionality, LTR for one and RTL for the other. These are used in bidi
text in two ways:

- At the start of a paragrph, one of these can be used to indicate the
paragraph's overall directionality in contexts where the directionality is
determined by the paragraph's first character with strong direction. This
is the default method of determining paragraph direction specified by the
Unicode Bidirectional Algorithm - and the *only* method allowed by the
current WebVTT spec. It is important to note that RTL languages fairly
often use "words" spelled in LTR characters, e.g. acronyms like GPS and
HTML (and WebVTT), as well as brand names. Occasionally, these occur as the
first word in a sentence or even a paragraph, and when this is the case,
the overall directionality of the paragraph is set incorrectly, unless one
puts an RLM at the beginning of the paragraph.

- In bidi text, these characters provide some means of control over the
visual ordering of the characters. For example, to get "Mamma Mia!" to come
out that way - and not as "!Mamma Mia" - in RTL text, one can put an LRM
after the exclamation mark. In HTML, there are other means of such control,
such as wrapping opposite-direction phrases in <span dir=...> or in a <bdi>
element. But such means are absent in WebVTT.

There are several reasons that I think an exception should be made for
these characters and escapes provided for them in WebVTT:

1. As mentioned above, WebVTT does not provide any means for controlling
paragraph directionality or inline directionality explicitly. Thus, the
author has no means but LRM and RLM for such control in a WebVTT file.

2. LRM and RLM are invisible. Entering invisible characters and editing
text that already contains them is confusing.

3. The existing standard Hebrew and Arabic keyboards do not provide a means
of generating an actual LRM or RLM. Although the Windows native TextBox
control provides a context menu that allows inserting various special
characters including LRM and RLM, and Microsoft Notepad uses TextBox and
thus provides the same context menu, most reasonable text editors available
on Windows (e.g. Notepad++) are not based on TextBox and do not provide a
means for generating LRM and RLM. The same, as far as I know, is true for
Linux (e.g. gedit).

Aharon

Received on Thursday, 8 December 2011 08:44:14 UTC