Re: WebVTT bidi: can we have ‎ and ‏ escapes?

I re-did my reading and yes, I think we may need them, if you want to avoid putting explicit <span>s around text that otherwise would get a correct directionality.

As I understand it, if UPPERCASE is right-to-left and lowercase is left to right, if you want an exclamation point at the (logical, visual) end of the embedded RTL text EFGH

in memory:  abcd "EFGH!" ijkl.
renders: abcd "HGFE!" ijkl.  (wrong)

can be corrected:
in memory: abcd "EFGH!&lrm" ijkl.
renders: abcd "!HGFE" ijkl.

or
in memory: abcd <span dir="rtl">"EFGH!"</span> ijkl.
renders: abcd "!HGFE" ijkl.

The verbosity of the second is troubling.  Would we allow LMR or RLM to be in a VTT file, or would we insist on the escapes only?

On Jan 4, 2012, at 9:01 , Aharon (Vladimir) Lanin wrote:

> ping?
> 
> On Thu, Dec 8, 2011 at 9:05 PM, Aharon (Vladimir) Lanin <aharon@google.com> wrote:
> Yes. I was not suggesting LRE, RLE, and PDF because these are not really human-usable. Even LRM and RLM push the envelope.
> 
> 
> On Thu, Dec 8, 2011 at 7:21 PM, Phillips, Addison <addison@lab126.com> wrote:
> Blar! I read the characters incorrectly. The marks are not sequence forming, and Kent is correct.
> 
>  
> 
> Addison
> 
>  
> 
> Addison Phillips
> 
> Globalization Architect (Lab126)
> 
> Chair (W3C I18N WG)
> 
>  
> 
> Internationalization is not a feature.
> 
> It is an architecture.
> 
>  
> 
>  
> 
>  
> 
> From: Kent Karlsson [mailto:kent.karlsson14@telia.com] 
> Sent: Thursday, December 08, 2011 8:58 AM
> To: Phillips, Addison; Aharon (Vladimir) Lanin; public-texttracks@w3.org; public-i18n-bidi@w3.org
> Subject: Re: WebVTT bidi: can we have &lrm; and &rlm; escapes?
> 
>  
> 
> These marks, no, they are not terminated by anything. They are freestanding.
> 
> LRE, LRO, RLE, and RLO are terminated (by PDF), since they do start a "span", but the marks don't.
> 
>     /Kent K
> 
> 
> Den 2011-12-08 17:48, skrev "Phillips, Addison" <addison@lab126.com>:
> 
> You need a third character: U+202C (PDF). Sequences starting with RLM or LRM are terminated using this character. See: http://www.w3.org/International/questions/qa-bidi-controls.en.php 
>  
> Addison
>  
> Addison Phillips
> Globalization Architect (Lab126)
> Chair (W3C I18N WG)
>  
> Internationalization is not a feature.
> It is an architecture.
>  
>  
>  
> 
> From: Aharon (Vladimir) Lanin [mailto:aharon@google.com] 
> Sent: Thursday, December 08, 2011 12:43 AM
> To: public-texttracks@w3.org; public-i18n-bidi@w3.org
> Subject: WebVTT bidi: can we have &lrm; and &rlm; escapes?
> 
> 
> The WebVTT spec currently allows just three escapes: &lt;, &gt;, and &amp;. Authors are expected to enter any other characters directly by whatever other means they have at their disposal.
> 
> 
> 
> I would like to suggest that an exception is needed for two more characters, LRM and RLM. These are invisible characters with strong directionality, LTR for one and RTL for the other. These are used in bidi text in two ways:
> 
> 
> 
> - At the start of a paragrph, one of these can be used to indicate the paragraph's overall directionality in contexts where the directionality is determined by the paragraph's first character with strong direction. This is the default method of determining paragraph direction specified by the Unicode Bidirectional Algorithm - and the *only* method allowed by the current WebVTT spec. It is important to note that RTL languages fairly often use "words" spelled in LTR characters, e.g. acronyms like GPS and HTML (and WebVTT), as well as brand names. Occasionally, these occur as the first word in a sentence or even a paragraph, and when this is the case, the overall directionality of the paragraph is set incorrectly, unless one puts an RLM at the beginning of the paragraph.
> 
> 
> 
> - In bidi text, these characters provide some means of control over the visual ordering of the characters. For example, to get "Mamma Mia!" to come out that way - and not as "!Mamma Mia" - in RTL text, one can put an LRM after the exclamation mark. In HTML, there are other means of such control, such as wrapping opposite-direction phrases in <span dir=...> or in a <bdi> element. But such means are absent in WebVTT.
> 
> 
> 
> There are several reasons that I think an exception should be made for these characters and escapes provided for them in WebVTT:
> 
> 
> 
> 1. As mentioned above, WebVTT does not provide any means for controlling paragraph directionality or inline directionality explicitly. Thus, the author has no means but LRM and RLM for such control in a WebVTT file.
> 
> 
> 
> 2. LRM and RLM are invisible. Entering invisible characters and editing text that already contains them is confusing.
> 
> 
> 
> 3. The existing standard Hebrew and Arabic keyboards do not provide a means of generating an actual LRM or RLM. Although the Windows native TextBox control provides a context menu that allows inserting various special characters including LRM and RLM, and Microsoft Notepad uses TextBox and thus provides the same context menu, most reasonable text editors available on Windows (e.g. Notepad++) are not based on TextBox and do not provide a means for generating LRM and RLM. The same, as far as I know, is true for Linux (e.g. gedit).
> 
> 
> 
> Aharon
> 
> 
> 

David Singer
Multimedia and Software Standards, Apple Inc.

Received on Wednesday, 11 January 2012 02:45:11 UTC