W3C home > Mailing lists > Public > www-international@w3.org > October to December 2015

[Bug 28266] [webvtt] 6.2.1 processing model handling of bidi [I18N-ISSUE-432]

From: <bugzilla@jessica.w3.org>
Date: Fri, 02 Oct 2015 13:30:53 +0000
To: www-international@w3.org
Message-ID: <bug-28266-4285-W3MJlvfPU1@http.www.w3.org/Bugs/Public/>
https://www.w3.org/Bugs/Public/show_bug.cgi?id=28266

--- Comment #24 from Richard Ishida <ishida@w3.org> ---
this post is going to get a little messy, but i suspect it is worth responding
to your comments.  I've been making notes and examples to resummarise my
understanding and clarify some remaining issues related to this heuristic
approach, but I'd suggest that we plan a teleconference call to go over this. 
It's likely that the ability to ask questions and adjust on the fly will
significantly shorten the process.

(In reply to Silvia Pfeiffer from comment #16)
> (In reply to Richard Ishida from comment #15)

> > [1]
> > if i understand correctly, the current approach establishes the base
> > direction of the lines of cue text by assuming that the text within a cue
> > will behave as if CSS unicode-bidi: plain-text was applied, ie.
> > for each paragraph (ie. line in WebVTT)
> 
> NOTE: it's not for a line in WebVTT, but for all lines in a cue (i.e. a
> paragraph)
> 
> 
> >, find the first strong character and
> > set the base direction per the direction of that character.
> > 
> > in principle, this works for setting direction at the per-paragraph level
> > unless you have
> > (a) a line that should be rtl, but starts with non-rtl characters (and vice
> > versa),
> > (b) a line with no strong character (such as a telephone number) or a
> > mixture of strong and non-strong characters (such as a Mac address) but that
> > has to ordered in a particular way.
> > 
> > authors would have to look for all such cases and add either &rlm; or &lrm;
> > to the start of the line to create the desired display.
> 
> (replace "line" with "paragraph" everywhere)
> Yes, that's the idea.
> 
> 
> > [1a]
> > actually, i'm not sure it's quite as simple as that, since much of the spec
> > text seems to concern itself with the direction of the first line in the
> > cue, with an implication that the direction determined from that will be
> > applied to any remaining paragraphs in that cue. This would mean that if you
> > had a line in English, the direction of that line would be rtl if the
> > preceding line started with, say, Arabic. I'm struggling a little to see the
> > bigger picture due to the complexity and algorithm-heavy nature of the spec,
> > so apologies if i'm missing something.
> 
> Just think of all the lines in a cue as a "paragraph" and apply
> directionality that way.

This is very different to the way the bidi algorithm works in Unicode and in
CSS. When autodetecting the direction of some text, that direction is applied
to the Unicode definition of a paragraph. Such a paragraph terminates with a
line break.

This approach will also cause difficulties for mutlilingual cues, such as 

00:18.000 --> 00:20.000
שלום עליכם!
Hello!

where the exclamation mark must appear to the left of the first line and to the
right of the second.

The implementation will need to go out of its way to override this normal
approach in order to restrict directional information to that of the first line
only, and in doing so it will mean that authors will need to do something
rather unusual to undo what the implementation did in cases such as the one
above. I'm not sure why the spec makes the implementation do extra work to set
the direction according to the first strong character in the cue rather than
the first strong character for each line, which is what the UBA would normally
do.



> > [2]
> > if WebVTT instead adds the ability to say
> > 
> > STYLE
> > direction:rtl;
> > 
> > then the default base direction for the content is established by that
> > statement, and all lines of cue text should get a base direction of rtl,
> > regardless of their first-strong character, unless some lower level
> > directive intervenes. The important thing to bear in mind is that this
> > approach is incompatible with first-strong heuristics, and &lrm; or &lrm; at
> > the start of the para are of no consequence.
> 
> Seeing as the first-strong heuristics apply to the whole cue (all of the
> lines), does that change your opinion?

Not at all. Compare what happens in HTML, if that helps.  If you set dir=rtl on
a div containing two p elements, the base direction for each of those p
elements is set to rtl.  I can only assume that if the first-strong heuristics
apply 'to the whole cue (all of the lines)', then the same base direction is
propagated to all those lines, and any &rlm; or &lrm; has no effect.

> 
> > When you have paragraphs/lines that should not have a direction of rtl (like
> > those mentioned above) you need a way to change their base direction using
> > some kind of metadata annotation, on a per paragraph basis.
> 
> &lrm; and &rlm; can do that within a cue.

&lrm; and &rlm; can't set the base direction when it is set declaratively,
since the initial strong character is not examined.  Try it in HTML. Make a p
element, add some directionally-sensitive text and add/remove &rlm; or &lrm; to
the start. There'll be no difference.  (Don't confuse RLM/LRM with RLI, RLE,
LRI, etc.)

> 
> > one could probably easily enough allow for some metadata declaration at the
> > cue level to change the direction of content, however it is actually
> > necessary to be able to change the direction of content for any
> > paragraph/line level, eg. it may be the second line in the cue that has to
> > be set to ltr. Since lines in WebVTT cues are not bounded by markup, i'm not
> > sure how one would do this using metadata/markup.
> 
> Lines in WebVTT cues are considered as a block, so they are bound. Also, you
> can use markup with a class span.

see above

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Received on Friday, 2 October 2015 13:30:56 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:41:09 UTC