Re: WebVTT bidi: should a cue be allowed to contain more than one paragraph? from Aharon (Vladimir) Lanin on 2012-01-04 (public-i18n-bidi@w3.org from January to March 2012)

From: Aharon (Vladimir) Lanin <aharon@google.com>
Date: Wed, 4 Jan 2012 19:54:28 +0200
To: Simon Pieters <simonp@opera.com>
Cc: public-texttracks@w3.org, public-i18n-bidi@w3.org
Message-ID: <CA+FsOYbGcL-_WXA_WPsMwB83qzdcSrWAMRt+PpK8wqzRUBBBxA@mail.gmail.com>
I would like to come back to the original issue. Currently, the WebVTT
spec allows a cue's text to contain LF and other newline-type characters
that the Unicode standard defines as separating paragraphs for bidi
purposes, but states that to "determine the paragraph embedding level of
the cue", the renderer must apply the constraint that the cue text
"represents a single paragraph". This constraint is ambiguous and generally
strange. Its possible interpretations, in decreasing order of severity, are:

a: When applying the Unicode Bidirectional Algorithm to the cue text to
determine the visual order of the characters, the cue text has to be
treated as if its LF characters had been replaced with LS characters (which
do not separate paragraphs in Unicode).

b: When applying the Unicode Bidirectional Algorithm to the cue text to
determine the visual order of the characters, the cue text be allowed to
form separate bidi paragraphs, but the embedding level of all these
paragraph should be determined by the first strong directional character in
the whole cue, i.e. usually the first paragraph.

c: When applying the Unicode Bidirectional Algorithm to the cue text to
determine the visual order of the characters, the cue text be allowed to
form separate bidi paragraphs, and each paragraph to have its own embedding
level determined by the first strong directional character in that
paragraph. However, if the cue's alignment is "start" or "end", the whole
cue should be aligned according to the first strong directional character
in the whole cue, i.e. usually the first paragraph.

In my opinion, possibilities a and b are unreasonable. They would result in
visual orderings different from each other, and different from the ordering
that Unicode defines for plain text (which a WebVTT cue basically is).

Possibility c is within reason. However, if that is what the spec intends,
it needs to be made clear, perhaps by stating that the single paragraph
constraint is only for the purpose of determining the alignment, not the
visual order of the cue's characters.

Alternatively, there is possibility d: drop the single-paragraph constraint
entirely, and let each paragraph have its own alignment. I do not have a
particular preference for d over c. It is just another possibility.

Aharon

On Thu, Dec 8, 2011 at 9:10 AM, Aharon (Vladimir) Lanin
<aharon@google.com>wrote:

> Even if my particular use-case (simultaneous Hebrew and English subtitles)
> is discounted, one still has to ask the question of why WebVTT explicitly
> allows LFs in a cue, while at the same time explicitly barring the LF from
> doing its normal job from separating the text into bidi paragraphs. Is
> there anything preventing a cue from having English text before and LF and
> French afterwards? Why then effectively prohibit switching from English to
> Hebrew (or vice-versa)?
>
> And even if we come to the conclusion that this is justified, we still
> have the issues of ambiguity / inconsistency in the current spec, as I
> described in points 1 and 2 in the original message.
>
> Aharon
>
>
> On Wed, Dec 7, 2011 at 4:22 PM, Simon Pieters <simonp@opera.com> wrote:
>
>> On Wed, 07 Dec 2011 14:20:34 +0100, Aharon (Vladimir) Lanin <
>> aharon@google.com> wrote:
>>
>>  The WebVTT cue text rendering rules currently require applying "the
>>> Unicode
>>> Bidirectional Algorithm's Paragraph Level steps" to a cue's text in order
>>> to "determine the paragraph embedding level of a cue", which is then used
>>> to determine the cue's direction (LTR or RTL), which is then used as the
>>> basis for the cue's alignment ("start" or "end"). These "Paragraph Level
>>> steps" (http://unicode.org/reports/**tr9/#The_Paragraph_Level<http://unicode.org/reports/tr9/#The_Paragraph_Level>)
>>> start with a
>>> requirement to "split the text into separate paragraphs", where the
>>> "paragraphs are divided by the Paragraph Separator or appropriate Newline
>>> Function". This is referring to the Unicode characters PS (U+2029), LF,
>>> CR,
>>> NEL (U+0085), and a few others. Now, a cue's text is explicitly permitted
>>> to contain LF characters, which, in Unicode Bidirectional Algorithm
>>> terms,
>>> separates paragraphs; I imagine that other paragraph-separating
>>> characters
>>> are also allowed. So, it would seem then that a cue can contain several
>>> (bidi) paragraphs.
>>>
>>> However, the WebVTT spec currently apparently does not want that to be
>>> the
>>> case, and goes on to say that the cue text "represents a single
>>> paragraph".
>>> It is this restriction that currently allows it to talk about "the
>>> paragraph embedding level of a cue" (emphasis on "the"), and thus the
>>> direction of a cue.
>>>
>>> I find this specification problematic in several ways:
>>>
>>> 1. The Unicode Bidirectional Algorithm states that "Paragraphs may also
>>> be
>>> determined by higher-level protocols: for example, the text in two
>>> different cells of a table will be in different paragraphs." IMO, this
>>> allows a higher level protocol - like WebVTT - to introduce paragraph
>>> boundaries besides those determined by the paragraph-separating
>>> characters.
>>> I am not at all sure that it allows a higher-level protocol to ignore
>>> paragraph boundaries already present in the cue text, which is implicit
>>> in
>>> WebVTT's insistence that a cue's text "represents a single paragraph".
>>> WebVTT can, of course, get rid of the paragraph-separating LFs (etc.) by
>>> replacing them with other characters (e.g. LS, U+2028) before handing the
>>> text over to the algorithm, but the WebVTT spec does not say to do so.
>>>
>>> 2. The WebVTT spec is unclear on whether the direction determined for the
>>> cue is only to be used as a basis for alignment, or if the cue text is
>>> actually to be *rendered* as a single paragraph in that direction. Please
>>> note that paragraph boundaries, and thus paragraph direction, are crucial
>>> to the correct display of bidirectional text. For example, let us
>>> consider
>>> the following text, where we represent RTL characters with uppercase
>>> Latin
>>> letters:
>>>
>>> THE FOOD WAS GOOD. HERE IS THE ADDRESS:
>>> 50 main st.
>>>
>>> I am assuming an LF between the two lines. The correct visual ordering
>>> for
>>> this text, as defined by "the Unicode Bidirectional Algorithm's Paragraph
>>> Level steps" is then (ignoring the issue of alignment):
>>>
>>> :SSERDDA EHT SI EREH .DOOG SAW DOOF EHT
>>> 50 main st.
>>>
>>> Note that the first line's colon is displayed on the left end. That's
>>> because the first line is an RTL paragraph (as determined by the
>>> "paragraph
>>> level steps" because it starts with an RTL character). Also note that the
>>> second line is an LTR paragraph. If the two lines were to be lumped into
>>> a
>>> single RTL paragraph, e.g. by replacing the LF with an LS,
>>> the result would be rather different:
>>>
>>> :SSERDDA EHT SI EREH .DOOG SAW DOOF EHT
>>> .main st 50
>>>
>>> Is this what the WebVTT spec currently requires? Or does it just want the
>>> correct display above, except with both lines aligned the same way?
>>>
>>> 3. I believe that there are use cases that require allowing a cue to
>>> contain more than one (bidi) paragraph. For example, there at least used
>>> to
>>> be a widespread practice in Israel for Hebrew-language films to come with
>>> subtitles that gave the dialogue in both the original Hebrew and in
>>> English
>>> translation, simultaneously on separate lines.
>>>
>>
>> The two languages don't need to be in the same cue. They don't even need
>> to be in the same file. You could have one track for Hebrew, another for
>> English, and enable both.
>>
>> Currently WebVTT does not support declaring language at all, although
>> there have been discussions to add it (per file, per block, per cue and/or
>> intra cue). Use cases presented in this area might inform both the
>> direction and language issues.
>>
>>
>>  For these reasons, I would suggest to do away with the concept of cue
>>> direction. A cue should be allowed to contain many (bidi) paragraphs, and
>>> each paragraph to determine its own direction. So, what do we do with
>>> alignment? Well, we could simply allow "start" and "end" to align each
>>> paragraph independently. If that is problematic (and I am not sure that
>>> this is actually the case), we could re-define "start" and "end" to mean
>>> the start and end side of the first non-empty paragraph. And if we wanted
>>> the application to decide which way to do it, we could define additional
>>> alignment values, e.g. "first-start" and "first-end" in addition to
>>> "start"
>>> and "end".
>>>
>>> Aharon
>>>
>>
>>
>> --
>> Simon Pieters
>> Opera Software
>>
>
>
Received on Wednesday, 4 January 2012 17:55:21 UTC