Re: WebVTT bidi: should a cue be allowed to contain more than one paragraph? from Aharon (Vladimir) Lanin on 2011-12-08 (public-i18n-bidi@w3.org from October to December 2011)

From: Aharon (Vladimir) Lanin <aharon@google.com>
Date: Thu, 8 Dec 2011 08:59:08 +0200
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Cc: Simon Pieters <simonp@opera.com>, public-texttracks@w3.org, public-i18n-bidi@w3.org
Message-ID: <CA+FsOYYL4JvpyHYfXKRrE6WfBd+jw50hp72sScO8ZP4Gu8F=Lg@mail.gmail.com>

On Wed, Dec 7, 2011 at 11:18 PM, Silvia Pfeiffer
<silviapfeiffer1@gmail.com>wrote:

> On Thu, Dec 8, 2011 at 1:22 AM, Simon Pieters <simonp@opera.com> wrote:
> > On Wed, 07 Dec 2011 14:20:34 +0100, Aharon (Vladimir) Lanin
> > <aharon@google.com> wrote:
> >> 3. I believe that there are use cases that require allowing a cue to
> >> contain more than one (bidi) paragraph. For example, there at least used
> >> to
> >> be a widespread practice in Israel for Hebrew-language films to come
> with
> >> subtitles that gave the dialogue in both the original Hebrew and in
> >> English
> >> translation, simultaneously on separate lines.
> >
> >
> > The two languages don't need to be in the same cue. They don't even need
> to
> > be in the same file. You could have one track for Hebrew, another for
> > English, and enable both.
>

> From the point of view of authoring that information, two WebVTT files
> work. However, the HTML spec says that only one track of kind captions
> or subtitles can be showing at any point in time
>
> http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#sourcing-out-of-band-text-tracks
> , so display of two languages would require one to be subtitles and
> the other to be metadata and rendered by JavaScript. I would therefore
> think that they should be in the same subtitle file. However, I would
> also recommend making them different cues, even if they are fully
> synchronized. At least that way you can position them better.
>
>
I am not sure that better positioning other than just a line break is
needed. And having to repeat all the timing info, etc., as well as offset
the vertical position (by an amount that depends on the length of the first
text) for each cue is a huge pain.

>
> > Currently WebVTT does not support declaring language at all, although
> there
> > have been discussions to add it (per file, per block, per cue and/or
> intra
> > cue). Use cases presented in this area might inform both the direction
> and
> > language issues.
>
> To satisfy Aharon's use case with both Hebrew and English on screen -
> even if provided in different cues - we will at minimum need a means
> to specify language per cue. Probably more flexible would be a means
> to associate a block of text within a cue with a language.
>

I certainly support allowing the WebVTT file to specify the language on the
file, cue, and perhaps even sub-cue level. (The last is necessary if you
have subtitles of what is being said in the video, and the speakers
sometimes slip in phrases in another language.)

However, specifying the language is not necessary to deal with
directionality. The current WebVTT spec says that directionality (per cue)
is determined using the paragraph level steps specified in the Unicode Bidi
Algorithm, which basically just looks for the paragraph's first character
with strong direction. (The Unicode tables specify a direction class for
all characters.) This allows directionality to be determined from the text
itself, without the author having to specify it or the language explicitly.
This is a fairly strong feature that I would not want to lose. I have a
hunch that authors will often forget to specify the language in WebVTT,
just like in HTML.

BTW, in HTML, directionality is specified using the dir attribute (which
can now be "auto" in order to let the first strongly directional character
in the element determine it), but is entirely unaffected by the lang
attribute.

> The use case for a per file language declaration is that not every
> webvtt file will be consumed by a Web browser, so the language can't
> be provided in a @srclang attribute of the "glue" markup and has to be
> provided elsewhere. Since this just sets the default language for the
> file, it does not solve the issues with bidi within a single cue's
> paragraph.
>
> Silvia.
>

Received on Thursday, 8 December 2011 07:00:07 UTC