- From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Date: Fri, 14 Jun 2013 17:19:05 +1000
- To: Pierre-Anthony Lemieux <pal@sandflow.com>
- Cc: public-html <public-html@w3.org>
Ah yes, you are correct. And indeed, these two sections need adjusting. Can you register a bug so we don't forget? Thanks, Silvia. On Fri, Jun 14, 2013 at 2:56 PM, Pierre-Anthony Lemieux <pal@sandflow.com> wrote: >> You may be looking at HTML5.0. HTML5.1 doesn't contain these any more. > > I pulled these two paragraphs from [1], which is HTML 5.1 nightly, right? > > [1] http://www.w3.org/html/wg/drafts/html/master/single-page.html > > -- Pierre > > On Thu, Jun 13, 2013 at 9:54 PM, Silvia Pfeiffer > <silviapfeiffer1@gmail.com> wrote: >> On Fri, Jun 14, 2013 at 1:50 AM, Pierre-Anthony Lemieux >> <pal@sandflow.com> wrote: >>> Hi Silvia, >>> >>> I like the idea of making the HTML cue interface independent from the >>> underlying serialization format, and move discussions on the latter to >>> the TTWG, as suggested by others. >> >> So you agree that this group should rename TextTrackCue to AbstractCue >> (or just Cue) and TextTrackCueList to CueList? >> >> >>> In fact, along the same lines, I would move paragraphs [a] and [b] >>> (see below) of Section 4.8.9 to the WebVTT specification. I think this >>> would remove the last normative provisions tied to a specific >>> serialization format. >> >> You may be looking at HTML5.0. HTML5.1 doesn't contain these any more. >> >> I would indeed suggest that we adjust HTML5.0 to contain the same text >> as HTML5.1 for tracks. >> >>> Hope it makes sense. >> >> Indeed. >> Thanks, >> Silvia. >> >> >>> Best, >>> >>> -- Pierre >>> >>> [a] If the element's track URL identifies a WebVTT resource, and the >>> element's kind attribute is not in the metadata state, then the WebVTT >>> file must be a WebVTT file using cue text. [WEBVTT] >>> >>> [b] Furthermore, if the element's track URL identifies a WebVTT >>> resource, and the element's kind attribute is in the chapters state, >>> then the WebVTT file must be both a WebVTT file using chapter title >>> text and a WebVTT file using only nested cues. [WEBVTT] >>> >>> On Tue, Jun 11, 2013 at 10:11 PM, Silvia Pfeiffer >>> <silviapfeiffer1@gmail.com> wrote: >>>> Hi all, >>>> >>>> The model in which we have looked at text tracks (<track> element of >>>> media elements) thus far has some issues that I would like to point >>>> out in this email and I would like to suggest a new way to look at >>>> tracks. This will result in changes to the HTML and WebVTT specs and >>>> has an influence on others specifying text track cue formats, so I am >>>> sharing this information widely. >>>> >>>> Current situation >>>> ============= >>>> Text tracks provide lists of timed cues for media elements, i.e. they >>>> have a start time, an end time, and some content that is to be >>>> interpreted in sync with the media element's timeline. >>>> >>>> WebVTT is the file format that we chose to define as a serialisation >>>> for the cues (just like audio files serialize audio samples/frames and >>>> video files serialize video frames). >>>> >>>> The means in which we currently parse WebVTT files into JS objects has >>>> us create objects of type WebVTTCue. These objects contain information >>>> about any kind of cue that could be included in a WebVTT file - >>>> captions, subtitles, descriptions, chapters, metadata and whatnot. >>>> >>>> The WebVTTCue object looks like this: >>>> >>>> enum AutoKeyword { "auto" }; >>>> [Constructor(double startTime, double endTime, DOMString text)] >>>> interface WebVTTCue : TextTrackCue { >>>> attribute DOMString vertical; >>>> attribute boolean snapToLines; >>>> attribute (long or AutoKeyword) line; >>>> attribute long position; >>>> attribute long size; >>>> attribute DOMString align; >>>> attribute DOMString text; >>>> DocumentFragment getCueAsHTML(); >>>> }; >>>> >>>> There are attributes in the WebVTTCue object that relate only to cues >>>> of kind captions and subtitles (vertical, snapToLines etc). For cues >>>> of other kinds, the only relevant attribute right now is the text >>>> attribute. >>>> >>>> This works for now, because cues of kind descriptions and chapters are >>>> only regarded as plain text, and the structure of the content of cues >>>> of kind metadata is not parsed by the browser. So, for cues of kind >>>> descriptions, chapters and metadata, that .text attribute is >>>> sufficient. >>>> >>>> >>>> The consequence >>>> =============== >>>> As we continue to evolve the functionality of text tracks, we will >>>> introduce more complex other structured content into cues and we will >>>> want browsers to parse and interpret them. >>>> >>>> For example, I expect that once we have support for speech synthesis >>>> in browsers [1], cues of kind descriptions will be voiced by speech >>>> synthesis, and eventually we want to influence that speech synthesis >>>> with markup (possibly a subpart of SSML [2] or some other simpler >>>> markup that influences prosody). >>>> >>>> Since we have set ourselves up for parsing all cue content that comes >>>> out of WebVTT files into WebVTTCue objects, we now have to expand the >>>> WebVTTCue object with attributes for speech synthesis, e.g. I can >>>> imagine cue settings for descriptions to contain a field called >>>> "channelMask" to contain which audio channels a particular cue should >>>> be rendered into with values being center, left, right. >>>> >>>> Another example is that eventually somebody may want to introduce >>>> ThumbnailCues that contain data URLs for images and may have a >>>> "transparency" cue setting. Or somebody wants to formalize >>>> MidrollAdCues that contain data URLs for short video ads and may have >>>> a "skippableAfterSecs" cue setting. >>>> >>>> All of these new cue settings would end up as new attributes on the >>>> WebVTTCue object. This is a dangerous design path that we have taken. >>>> >>>> [1] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html#tts-section >>>> [2] http://www.w3.org/TR/speech-synthesis/#S3.2 >>>> >>>> >>>> Problem analysis >>>> ================ >>>> What we have done by restricting ourselves to a single WebVTTCue >>>> object to represent all types of cues that come from a WebVTT file is >>>> to ignore that WebVTT is just a serialisation format for cues, but >>>> that cues are the ones that provide the different types of timed >>>> content to the browser. The browser should not have to care about the >>>> serialisation format. But it should care about the different types of >>>> content that a track cue could contain. >>>> >>>> For example, it is possible that a WebVTT caption cue (one with all >>>> the markup and cue settings) can be provided to the browser through a >>>> WebM file or through a MPEG file or in fact (gasp!) through a TTML >>>> file. Such a cue should always end up in a WebVTTCue object (will need >>>> a better name) and not in an object that is specific to the >>>> serialisation format. >>>> >>>> What we have done with WebVTT is actually two-fold: >>>> 1. we have created a file format that serializes arbitrary content >>>> that is time-synchronized with a media element. >>>> 2. and we have created a simple caption/subtitle cue format. >>>> >>>> That both are called "WebVTT" is the cause of a lot of confusion and >>>> not a good design approach. >>>> >>>> >>>> The solution >>>> =========== >>>> We thus need to distinguish between cue formats in the browser and not >>>> between serialisation formats (we don't distinguish between different >>>> image formats or audio formats in the browser either - we just handle >>>> audio samples or image pixels). >>>> >>>> Once a WebVTT file is parsed into a list of cues, the browser should >>>> not have to care any more that the list of cues came from a WebVTT >>>> file or anywhere else. It's a list of cues with a certain type of >>>> content that has a parsing and a rendering algorithm attached. >>>> >>>> >>>> Spec consequences >>>> ================== >>>> What needs to change in the specs to deal with this different approach >>>> to text tracks is not hard to deduct. >>>> >>>> >>>> Firstly, there are consequences on the WebVTT spec. >>>> >>>> I suggest we rename WebVTTCue [1] to VTTCaptionCue and allow such cues >>>> only on tracks of kind={caption, subtitle}. >>>> Also, we separate out the WebVTT serialisation format syntax >>>> specification from the cue syntax specification [2] and introduce >>>> separate parsers [3] for the different cue syntax formats. >>>> The rendering section [4] has already started distinguishing between >>>> cue rendering for chapters and for captions/subtitles. This will >>>> easily fit with the now separated cue syntax formats. >>>> >>>> We will then introduce a ChapterCue which adds a .text attribute and a >>>> constructor onto AbstractCue for cues (in WebVTT or from elsewhere) >>>> that are interpreted as chapters and have their own rendering >>>> algorithm. >>>> Similarly, we introduce a DescriptionCue which adds a .text attribute >>>> and a constructor onto AbstractCue and we define a rendering algorithm >>>> that makes use of the new speech synthesis API [5]. >>>> Similarly, we introduce a MetadataCue which adds a .content attribute >>>> and a constructor onto AbstractCue with no rendering algorithm. >>>> I think these new cue objects would even make more sense being defined >>>> in HTML including their rendering algorithms rather than in the WebVTT >>>> spec, because they are generic and we don't want chapters to be >>>> rendered differently just because they have originated from a >>>> different serialisation format. >>>> >>>> [1] http://dev.w3.org/html5/webvtt/#webvtt-api >>>> [2] http://dev.w3.org/html5/webvtt/#syntax >>>> [3] http://dev.w3.org/html5/webvtt/#parsing >>>> [4] http://dev.w3.org/html5/webvtt/#rendering >>>> [5] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html#tts-section >>>> >>>> >>>> >>>> Secondly, there are consequences for the TextTrackCue object hierarchy >>>> in the HTML spec. >>>> >>>> I suggest we rename TextTrackCue [6] to AbstractCue (or just Cue). It >>>> is simply the abstract result of parsing a serialisation of cues (e.g. >>>> a WebVTT file) into its individual cues. >>>> >>>> Similarly TextTrackCueList [7] should be renamed to CueList and should >>>> be a cue list of only one particular type of cue. Thus, the parsing >>>> and rendering algorithm in use for all cues in a CueList is fixed. >>>> Also, a CueList of e.g. ChapterCues should only be allowed to be >>>> attached to a track of kind=chapters, etc. >>>> >>>> [6] http://www.w3.org/html/wg/drafts/html/master/single-page.html#texttrackcue >>>> [7] http://www.w3.org/html/wg/drafts/html/master/single-page.html#texttrackcuelist >>>> >>>> Doing this will make WebVTT and the TextTrack API extensible for new >>>> cue formats, such as cues in SSML format, or ThumbnailCues, or >>>> MidrollAdCues or whatnot else we may see necessary in the future. >>>> >>>> This may look like a lot of changes, but it's really just some >>>> renaming and an introduction of a small number of semantically clean >>>> new objects. I'm happy to prepare the patches for the WebVTT and >>>> HTML5.1 specs if this is agreeable. >>>> >>>> Feedback welcome. >>>> >>>> Regards, >>>> Silvia. >>>>
Received on Friday, 14 June 2013 07:19:52 UTC