Re: [media] issue-152: documents for further discussion from Silvia Pfeiffer on 2011-04-20 (public-html-a11y@w3.org from April 2011)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Wed, 20 Apr 2011 14:38:41 +1000
To: Mark Watson <watsonm@netflix.com>
Cc: Ian Hickson <ian@hixie.ch>, Philip Jägenstedt <philipj@opera.com>, "public-html-a11y@w3.org" <public-html-a11y@w3.org>
Message-ID: <BANLkTikSOvGJZRSD-FkijVBTd3CMCO3VfQ@mail.gmail.com>
Mark,

what is your list of track kinds for in-band tracks?

I have thus far come up with the following:

video:
* sign language video (in different sign languages)
* captions (as in: burnt-in video that may just be overlays)
* different camera angle

audio:
* audio descriptions
* language dub

Cheers,
Silvia.


On Wed, Apr 20, 2011 at 2:26 PM, Mark Watson <watsonm@netflix.com> wrote:
> I'd like to second the requirement for an enumerated 'kind' for in-band tracks. Considering adaptive streaming approaches, the information about track kinds is much more likely to be available in-band than from external metadata systems.
>
> Whilst the external metadata might contain information about what is available, for presentation to the user (e.g. List of languages) it doesn't make sense to require that layer to include information about how this is mapped to any particular container or manifest format, when multiple such versions of the content might be available (and might be added or removed without changes to the user-visible metadata).
>
> A clean separation between UI and transport implies there is more than
> just natural-languages at the media transport layer: in fact it would make more sense to have *only* the enumerated kind at the media transport layer and leave the natural language aspects to the presentation layer.
>
> Furthermore, enumerated kinds are necessary to make initial choices based on user preferences - you have to be able to understand *what* the tracks are. Also presentation of such tracks to the user might not be a single menu of choices: there is structure such as language choices for main audio and audio descriptions which mirror the language choices for the corresponding subtitle tracks and this structure needs to be exposed for a sensible UI that properly considers accessibility.
>
> ...Mark
>
> Sent from my iPhone
>
> On Apr 19, 2011, at 7:22 PM, "Silvia Pfeiffer" <silviapfeiffer1@gmail.com> wrote:
>
>> On Wed, Apr 20, 2011 at 10:51 AM, Ian Hickson <ian@hixie.ch> wrote:
>>> On Tue, 12 Apr 2011, Silvia Pfeiffer wrote:
>>>>>> the TrackList only includes name and language attributes - in
>>>>>> analogy to TextTrack it should probably rather include (name, label,
>>>>>> language, kind)
>>>>>
>>>>> I'm fine with exposing more data, but I don't know what data in-band
>>>>> tracks typically have. What do in-band tracks in popular video formats
>>>>> expose? Is there any documentation on this?
>>>>
>>>> There is a discussion on the main list about metadata right now and I
>>>> have posted a link there about what the W3C Media Annotations WG 's
>>>> analysis of media formats found as typically used metadata on audio and
>>>> video. If you want to understand what is generally available, that is a
>>>> good starting point, see http://www.w3.org/TR/mediaont-10/ .
>>>
>>> Woah, that's a lot of data. I guess a better approach for this will be to
>>> look at use cases and figure out what needs exposing.
>>
>> Yeah, I agree. And by no means am I suggesting to adopt all of them,
>> or even to adopt the complex structure that the WG came up with. I
>> look at it as an interesting analysis in what is available.
>>
>>
>>>> I would, however, regard these two attributes that we discussed here as
>>>> a separate issue, because if somebody wants to create custom controls
>>>> and e.g. provide all the alternative video descriptions in one menu,
>>>> they would want all the text descriptions and audio descriptions listed
>>>> - similarly if they want all the alternative captions in one menu, they
>>>> would want all the text track captions as well as all the videos that
>>>> are created from bitmaps as overlay captions as well as all the
>>>> alternative video tracks with burnt-in captions. So, providing a label
>>>> (for use in the menu) and a kind (for classification) is very useful.
>>>> These can all be mapped from fields from within video formats.
>>>
>>> I assume you're talking primarily about "kind" here. "name" and "label"
>>> are the same thing (actually I've renamed "name" to "label" to improve
>>> consistency with other parts of the platform).
>>
>> Yes, I don't mind if we call it "name" or "label" - I do prefer label
>> to be consistent with the Text Track.
>>
>> I do think we need an additional "id" or similar, which is unique and
>> can be used for fragment addressing. (See the other thread).
>>
>>
>>> Looking at the metadata list cited above, I don't see anything in either
>>> ogg, mp4, or webm that maps to "kind", so I don't see much point exposing
>>> that on the audio/video track lists, though I agree that in principle it
>>> would be a good idea.
>>
>> Let's be clear: the media annotations list is not complete for each
>> one of the formats. It is trying to identify a subset that will work
>> across many formats.
>>
>> Also, they actually have a "role" attribute on the "fragment" which
>> they suggest using for identifying the "kind" of a "track":
>> http://www.w3.org/TR/mediaont-10/#example4 . So, it is indeed there.
>> Hmm.. given this, I should probably change what is written for "OGG"
>> under "fragment", because certainly Ogg has fields that provide the
>> kind of a track. WebM and MP4 have them, too.
>>
>>
>>> Realistically though, for in-band tracks it's more likely that that data
>>> will be provided to the script out-of-band so that it can construct the UI
>>> before the movie loads, and for out-of-band tracks the information can be
>>> made available in the markup (e.g. using data-* attributes). For UA-driven
>>> menus, the title is probably sufficient for most purposes, and that can
>>> already be made available.
>>
>> The biggest issue with this approach is discoverability. A Web
>> developer that has to deal with multiple resources for which he
>> doesn't a-priori know what kinds of tracks they have available in-band
>> has no chance to find this out through script if there is no interface
>> that exposes this information. It would need to be done server-side.
>>
>>
>>>>> Note that for media-element-level data, you can already use data-*
>>>>> attributes to get anything you want, so the out-of-band case is
>>>>> already fully handled as far as I can tell.
>>>>
>>>> Interesting. In the audio description case, would a label, kind, and
>>>> language be added to the menu of the related video element?
>>>
>>> For scripted UIs, that's up to the script.
>>>
>>> For UA UIs, it depends if we are talking about multiple video tracks or
>>> multiple audio tracks. Multiple video tracks aren't handled, because
>>> there's no sane way to have the UA turn the video tracks on and off. For
>>> the audio case, I don't really see much reason to expose more than a
>>> title. A kind could be used but it's going to be used so rarely that in
>>> practice the UA will want to handle the case of only having a title
>>> anyway, and once you support that, it's not clear what a kind would really
>>> do to make things better.
>>>
>>> It's something we can always provide in the future though, if it turns out
>>> to be more common than one would guess from looking at content today.
>>
>> I think it will be a problem with the first implementation of this,
>> since we would want to add the information to the menu for audio
>> tracks just like for text tracks and the text tracks have this
>> information (kind, label, language).
>>
>> I guess we can wait till then, though, since it doesn't change
>> anything substantial about the way in which thing work.
>>
>>
>>
>>>>> | a group should be able to loop over the full multitrack rather than a
>>>>> | single slave
>>>>>
>>>>> Not sure what this means.
>>>>
>>>> We discussed the looping behaviour. To make it symmetrical with in-band
>>>> multitrack resources, it would make sense to be able to loop over
>>>> composed multitrack resources, too. The expected looping behaviour is
>>>> that a loop on the composed resource loops over the composite as a
>>>> whole. So, the question is then how to turn such looping on.
>>>>
>>>> The proposal is that when one media element in the group has a @loop
>>>> attribute, that would turn the looping on the composite resource on.
>>>> This means that when the loop is set and the end of the composite
>>>> resource is reached (its duration), the currentTime would be reset to
>>>> its beginning and playback of the composite resource would start again.
>>>> Looping on individual elements is turned off and only the composite
>>>> resource can loop.
>>>
>>> What's the use case?
>>
>> The same as for the loop attributes on a audio or video element. It's
>> a media resource and should work consistently to how other media
>> resources are handled.
>>
>> E.g. if I have a plugin that likes to turn all media elements to
>> looping for whatever reason (entertain the kids? ;-), I can do that
>> for normal media elements and for in-band multitrack consistently with
>> the loop attribute, but I have to make an exception for composed
>> multitrack, because it doesn't allow for the handling of a loop
>> attribute. (stupid example, I know: so pick something with music..)
>>
>>
>>
>>>>> | some attributes of HTMLMediaElement are missing in the MediaController
>>>>> | that might make sense to collect state from the slaves: error,
>>>>>
>>>>> Errors only occur as part of loading, which is a per-media-element
>>>>> issue, so I don't really know what it would mean for the controller to
>>>>> have it.
>>>>
>>>> The MediaController is generally regarded as the state keeper for the
>>>> composite resource.
>>>
>>> It is? That's certainly not how it's defined. It's just a central
>>> controller, it doesn't keep any of the state for the resources.
>>
>> Not for the individual ones, but for the combined construct. E.g. you
>> can ask it for what the currentTime or the combined construct is, etc.
>>
>>
>>>> So, what happens when a single slave goes into error state. Does the
>>>> full composite resource go into error state? Or does it ignore the slave
>>>> - turn it off, and continue?
>>>
>>> Media elements don't really have an error state. They have a networkState
>>> and a readyState, which affect the MediaController, but the 'error'
>>> attribute is just for exposing the last error for events, it's not part of
>>> the state machine.
>>
>> That still doesn't answer the question: what happens if one of the
>> slaves happens to have a network error and cannot continue playing,
>> because it runs out of data. Does the combined resource stall?
>> Forever? Is there a way for script to identify this and remove the
>> stalling slave from the group? Maybe we need an onerror event on the
>> MediaController, which will be raised if one of the slaves has an
>> error fetching the media data. Then the script developer can go
>> through the list of slaves in the one callback and remove the
>> contender.
>>
>>
>>>>> | readyState
>>>>>
>>>>> I could expose a readyState that returns the lowest value of all the
>>>>> readyState values of the slaved media elements, would that be useful?
>>>>> It would be helpful to see a sample script that would make use of
>>>>> this; I don't really understand why someone would care about doing
>>>>> this at the controller level rather than the individual track level.
>>>>
>>>> I think it makes sense, in particular when script is waiting for all
>>>> elements to go to HAVE_METADATA state, which is often the case when you
>>>> are trying to do something on the media resource, but have to wait until
>>>> it's actually available.
>>>>
>>>> An example JS would be where you are running your own controls for the
>>>> combined resource and want to determine the combined duration and volume
>>>> for visual display, e.g.
>>>>
>>>>      video.controller.addEventListener("loadedmetadata", init, false);
>>>>      function init(evt) {
>>>>        duration.innerHTML = video.controller.duration.toFixed(2);
>>>>        vol.innerHTML      = video.controller.volume.toFixed(2);
>>>>      }
>>>>
>>>> So, I think a combined readyState makes sense in the way you described.
>>>
>>> That example doesn't use readyState at all. Is there a use case for
>>> readyState specifically?
>>
>>
>> We actually discussed in the last call whether we only need the events
>> or also readyState. Eric had an example where you would raise an
>> event, but only do something if the element is only in a particular
>> readyState at the time of processing. I don't remember exactly what it
>> was. My position is that we really need the events, but I could live
>> without having a combined readyState.
>>
>>
>>>>> | (this one is particularly important for onmetadatavailable events)
>>>>>
>>>>> The events are independent of the attributes. What events would you
>>>>> want on a MediaController, and why? Again, sample code would really
>>>>> help clarify the use cases you have in mind.
>>>>
>>>> Maybe a onmetadatavailable event is more useful than a readyState then?
>>>
>>> I've updated the spec to fire a number of events on MediaController,
>>> including 'metadataavailable' and 'playing'/'waiting'.
>>
>> Ah excellent. That's great.
>>
>>
>>>> I am not aware of many scripts that use the readyState values directly
>>>> for anything, even on the media elements themselves.
>>>
>>> One example of readyState usage on a media element would be a 'waiting'
>>> event handler that checks whether readyState is HAVE_CURRENT_DATA or
>>> HAVE_METADATA and uses that information to decide whether to display a
>>> poster frame overlay or not.
>>>
>>>
>>>>> | TimeRanges played
>>>>>
>>>>> Would this return the union or the intersection of the slaves'?
>>>>
>>>> That would probably be the union, because those parts of the timeline
>>>> are what the user has viewed, so he/she would expect them to be marked
>>>> in manually created controls.
>>>
>>> Ok, added.
>>>
>>>
>>>>> | ended
>>>>>
>>>>> Since tracks can vary in length, this doesn't make much sense at the
>>>>> media controller level. You can tell if you're at the end by looking
>>>>> at currentTime and duration, but with infinite streams and no
>>>>> buffering the underlying slaves might keep moving things along
>>>>> (actively playing) with currentTime and duration both equal to zero
>>>>> the whole time. So I'm not sure how to really expose 'ended' on the
>>>>> media controller.
>>>>
>>>> "ended" on the individual elements (in the absence of loop) returns true when
>>>>
>>>> Either:
>>>>
>>>>    The current playback position is the end of the media resource, and
>>>>    The direction of playback is forwards.
>>>>
>>>> Or:
>>>>
>>>>    The current playback position is the earliest possible position, and
>>>>    The direction of playback is backwards.
>>>
>>> No, 'ended' only fires when going forwards.
>>
>>
>> I quoted from the spec:
>> http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#ended-playback
>>
>>
>>>> So, in analogy, for the composed resource: it would return the union of
>>>> the ended result on all individual elements, namely "ended" only when
>>>> all of them are in ended state.
>>>
>>> But what's the use case?
>>
>> If I reach the end, I want to present something different, such as a
>> post-roll add or an overlay with links to other videos that are
>> related. It is much easier to wait on a onended event on the combined
>> resource than having to register an event handler with each slave and
>> then try and combine the result.
>>
>>
>>>>> | and autoplay.
>>>>>
>>>>> How would this work? Autoplay doesn't really make sense as an IDL
>>>>> attribute, it's the content attribute that matters. And we already have
>>>>> that set up to work with media controllers.
>>>>
>>>> As with @loop, it would be possible to say that when one media element
>>>> in the union has @autoplay set, then the combined resource is in
>>>> autoplay state.
>>>
>>> I don't understand the use case for exposing this as an IDL attribute on
>>> the controller.
>>
>> Same use case as for any other media element - and then to have it
>> consistent, so that we can use the same code to deal with grouped
>> media elements as with in-band multitrack elements. For @autoplay it
>> even has an accessibility use case: it's possible for a UA or plugin
>> to provide settings to stop autoplay on media elements with the
>> @autoplay IDL attribute. It's not easily possible to stop hand-coded
>> play() calls, which would be the only way for grouped multitrack media
>> in the way in which it is currently specified.
>>
>>
>>
>>>> One more question turned up today: is there any means in which we could
>>>> possibly create @controls (with track menu and all) for the combined
>>>> resource? Maybe they could be the same controls on all the elements that
>>>> have a @controls active, but would actually be driven by the
>>>> controller's state rather than the element's? Maybe the first video
>>>> element that has a @controls attribute would get the full controller's
>>>> state represented in the controls? Could there be any way to make
>>>> @controls work?
>>>
>>> The UA is responsible for this, but the spec requires that the UI
>>> displayed for a control that has a controller control the controller.
>>
>> That's good. Question of clarification: Does that mean that for all
>> elements in a group that display controls these controls actually
>> control the controller? Or do only the controls of the media element
>> that created the controller control the controller?
>> (hope that makes sense to you ;-)
>>
>>
>> Cheers,
>> Silvia.
>>
>>
>
Received on Wednesday, 20 April 2011 04:39:29 UTC