[webvtt] Tokenizer doesn't parse escapes in annotation

zcorpan has just created a new issue for 
https://github.com/w3c/webvtt:

== Tokenizer doesn't parse escapes in annotation ==
https://w3c.github.io/webvtt/#webvtt-start-tag-annotation-state

> WebVTT start tag annotation state
> Jump to the entry that matches the value of c:
>
> U+003E GREATER-THAN SIGN character (>)
> Advance position to the next character in input, then jump to the 
next "end-of-file marker" entry below.
>
> End-of-file marker
> Remove any leading or trailing space characters from buffer, and 
replace any sequence of one or more consecutive space characters in 
buffer with a single U+0020 SPACE character; then, return a start tag 
whose tag name is result, with the classes given in classes, and with 
buffer as the annotation, and abort these steps.
>
> Anything else
> Append c to buffer and jump to the step labeled next.

This doesn't tokenize escapes, AFAICT. But the syntax allows escapes 
in annotations:

https://w3c.github.io/webvtt/#webvtt-cue-span-start-tag

>If the start tag requires an annotation: a U+0020 SPACE character or 
a U+0009 CHARACTER TABULATION (tab) character, followed by one or more
 of the following components, the concatenation of their 
representations having a value that contains at least one character 
other than U+0020 SPACE and U+0009 CHARACTER TABULATION (tab) 
characters:
>
> WebVTT cue span start tag annotation text, representing the text of 
the annotation.
> A WebVTT cue amp escape, representing a "&" character in the text of
 the annotation.
> A WebVTT cue lt escape, representing a "<" character in the text of 
the annotation.
> A WebVTT cue gt escape, representing a ">" character in the text of 
the annotation.
> A WebVTT cue lrm escape, representing a U+200E LEFT-TO-RIGHT MARK 
Unicode bidirectional formatting character in the text of the cue.
> A WebVTT cue rlm escape, representing a U+200F RIGHT-TO-LEFT MARK 
Unicode bidirectional formatting character in the text of the cue.
> A WebVTT cue nbsp escape, representing a U+00A0 NO-BREAK SPACE 
character in the text of the cue.

See https://github.com/w3c/webvtt/issues/252

Received on Thursday, 12 November 2015 12:34:27 UTC