WebVTT bidi: should a cue be allowed to contain more than one paragraph? from Aharon (Vladimir) Lanin on 2011-12-07 (public-texttracks@w3.org from December 2011)

From: Aharon (Vladimir) Lanin <aharon@google.com>
Date: Wed, 7 Dec 2011 15:20:34 +0200
To: public-texttracks@w3.org
Message-ID: <CA+FsOYZ7+ucX+2FVW7snk3AiP4pe4LW5Zep5hTPTHBuzrzfHRg@mail.gmail.com>
The WebVTT cue text rendering rules currently require applying "the Unicode
Bidirectional Algorithm's Paragraph Level steps" to a cue's text in order
to "determine the paragraph embedding level of a cue", which is then used
to determine the cue's direction (LTR or RTL), which is then used as the
basis for the cue's alignment ("start" or "end"). These "Paragraph Level
steps" (http://unicode.org/reports/tr9/#The_Paragraph_Level) start with a
requirement to "split the text into separate paragraphs", where the
"paragraphs are divided by the Paragraph Separator or appropriate Newline
Function". This is referring to the Unicode characters PS (U+2029), LF, CR,
NEL (U+0085), and a few others. Now, a cue's text is explicitly permitted
to contain LF characters, which, in Unicode Bidirectional Algorithm terms,
separates paragraphs; I imagine that other paragraph-separating characters
are also allowed. So, it would seem then that a cue can contain several
(bidi) paragraphs.

However, the WebVTT spec currently apparently does not want that to be the
case, and goes on to say that the cue text "represents a single paragraph".
It is this restriction that currently allows it to talk about "the
paragraph embedding level of a cue" (emphasis on "the"), and thus the
direction of a cue.

I find this specification problematic in several ways:

1. The Unicode Bidirectional Algorithm states that "Paragraphs may also be
determined by higher-level protocols: for example, the text in two
different cells of a table will be in different paragraphs." IMO, this
allows a higher level protocol - like WebVTT - to introduce paragraph
boundaries besides those determined by the paragraph-separating characters.
I am not at all sure that it allows a higher-level protocol to ignore
paragraph boundaries already present in the cue text, which is implicit in
WebVTT's insistence that a cue's text "represents a single paragraph".
WebVTT can, of course, get rid of the paragraph-separating LFs (etc.) by
replacing them with other characters (e.g. LS, U+2028) before handing the
text over to the algorithm, but the WebVTT spec does not say to do so.

2. The WebVTT spec is unclear on whether the direction determined for the
cue is only to be used as a basis for alignment, or if the cue text is
actually to be *rendered* as a single paragraph in that direction. Please
note that paragraph boundaries, and thus paragraph direction, are crucial
to the correct display of bidirectional text. For example, let us consider
the following text, where we represent RTL characters with uppercase Latin
letters:

THE FOOD WAS GOOD. HERE IS THE ADDRESS:
50 main st.

I am assuming an LF between the two lines. The correct visual ordering for
this text, as defined by "the Unicode Bidirectional Algorithm's Paragraph
Level steps" is then (ignoring the issue of alignment):

:SSERDDA EHT SI EREH .DOOG SAW DOOF EHT
50 main st.

Note that the first line's colon is displayed on the left end. That's
because the first line is an RTL paragraph (as determined by the "paragraph
level steps" because it starts with an RTL character). Also note that the
second line is an LTR paragraph. If the two lines were to be lumped into a
single RTL paragraph, e.g. by replacing the LF with an LS,
the result would be rather different:

:SSERDDA EHT SI EREH .DOOG SAW DOOF EHT
.main st 50

Is this what the WebVTT spec currently requires? Or does it just want the
correct display above, except with both lines aligned the same way?

3. I believe that there are use cases that require allowing a cue to
contain more than one (bidi) paragraph. For example, there at least used to
be a widespread practice in Israel for Hebrew-language films to come with
subtitles that gave the dialogue in both the original Hebrew and in English
translation, simultaneously on separate lines.

For these reasons, I would suggest to do away with the concept of cue
direction. A cue should be allowed to contain many (bidi) paragraphs, and
each paragraph to determine its own direction. So, what do we do with
alignment? Well, we could simply allow "start" and "end" to align each
paragraph independently. If that is problematic (and I am not sure that
this is actually the case), we could re-define "start" and "end" to mean
the start and end side of the first non-empty paragraph. And if we wanted
the application to decide which way to do it, we could define additional
alignment values, e.g. "first-start" and "first-end" in addition to "start"
and "end".

Aharon
Received on Wednesday, 7 December 2011 13:21:24 UTC