Re: WebVTT Spec Question

On Sat, Mar 9, 2013 at 7:28 AM, David Ronca <dronca@netflix.com> wrote:

>  The spec gives the following definition of the WebVTT header****
>
>    1. An optional U+FEFF BYTE ORDER MARK (BOM) character.****
>    2. The string "WEBVTT".****
>    3. Optionally, either a U+0020 SPACE character or a U+0009 CHARACTER
>    TABULATION (tab) character followed by any number of characters that are
>    not U+000A LINE FEED (LF) or U+000D CARRIAGE RETURN (CR) characters.***
>    *
>    4. Two or more WebVTT line terminators<http://dev.w3.org/html5/webvtt/#webvtt-line-terminator>
>    .****
>
> This suggests that a valid header looks like this:****
>
> ** **
>
> WEBVTT[<space>optional text]****
>
> <cr>|<lf>|<cr><lf>****
>
> <cr>|<lf>|<cr><lf>
>


You're implying a cr/lf at the end of the WEBVTT line, when in fact it is
this:

WEBVTT[<space>optional text]<cr>|<lf>|<cr><lf>

<cr>|<lf>|<cr><lf>


****
>
> ** **
> But the parsing section describes the initial parsing:
>
> ** **
>
> **1.    **The character indicated by position is a U+000A LINE FEED (LF)
> character. Advance position to the next character in input.****
>
> **2.    ***Header*: Collect a sequence of characters<http://dev.w3.org/html5/webvtt/#collect-a-sequence-of-characters>
>  that are *not* U+000A LINE FEED (LF) characters. Let line be those
> characters, if any.****
>
> **3.    **If position is past the end of input, then jump to the step
> labeled *end*.****
>
> **4.    **The character indicated by position is a U+000A LINE FEED (LF)
> character. Advance position to the next character in input.****
>
> **5.    **If line contains the three-character substring "-->" (U+002D
> HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN), then set the
>  already collected line flag and jump to the step labeled *cue loop*.****
>
> **6.    **If line is not the empty string, then jump back to the step
> labeled *header*.****
>
> #6 suggests that a header can look like this:****
>
> ** **
>
> WEBVTT[<space>optional text]****
>
> <cr>|<lf>|<cr><lf>****
>
> Some text****
>
> <cr>|<lf>|<cr><lf>****
>
> Some text****
>
> …****
>
> <cr>|<lf>|<cr><lf>****
>
> <cr>|<lf>|<cr><lf>****
>
> **
>


Yeah, more like this:

WEBVTT[<space>optional text]<cr>|<lf>|<cr><lf>

Some text<cr>|<lf>|<cr><lf>

Some text<cr>|<lf>|<cr><lf>

…

Some text<cr>|<lf>|<cr><lf>
             <cr>|<lf>|<cr><lf>


 **
>
> That is, there can be many lines of header until there are 2 successive
> empty lines
>

Not two - just one empty line.


> ****
>
> ** **
>
> Any thoughts on the correct way to interpret this [possible] conflict?
>

No, there is no conflict. The first one is the current spec, the second is
the requirement on how to parse it so that the current spec can be extended
in the future.

We indeed have made use of header metadata lines in this spec:
https://dvcs.w3.org/hg/text-tracks/raw-file/default/608toVTT/608toVTT.html

HTH.

Cheers,
Silvia.

Received on Friday, 8 March 2013 21:11:23 UTC