Re: Proposal from HbbTV from Silvia Pfeiffer on 2014-09-30 (public-inbandtracks@w3.org from September 2014)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Tue, 30 Sep 2014 20:48:27 +1000
To: Nigel Megitt <nigel.megitt@bbc.co.uk>
Cc: Alexander Adolf <alexander.adolf@condition-alpha.com>, "public-inbandtracks@w3.org" <public-inbandtracks@w3.org>, Jon Piesing <Jon.Piesing@tpvision.com>
Message-ID: <CAHp8n2mwH8JBYcUzM+CMnDwE0MafVnKZwtCPCXdA1zJ3c9rHqQ@mail.gmail.com>
On Tue, Sep 30, 2014 at 8:10 PM, Nigel Megitt <nigel.megitt@bbc.co.uk> wrote:
>
>
> On 28/09/2014 22:10, "Silvia Pfeiffer" <silviapfeiffer1@gmail.com> wrote:
>
>>On Thu, Sep 25, 2014 at 10:20 PM, Nigel Megitt <nigel.megitt@bbc.co.uk>
>>wrote:
>>>> Motte to the point though: a text track with 0 reported cues is
>>>> indistinguishable from a text track where all cues failed parsing.
>>>>This it's
>>>> not obvious whether that will be a usable track or not. It's therefore
>>>>not
>>>> really a text track, but something special that the platform hasn't
>>>> considered yet.
>>>
>>> Is that exactly correct? Let's look at the mode of the text track and
>>>it's
>>> readiness state:
>>>
>>> According to
>>>
>>>http://dev.w3.org/html5/spec-preview/media-elements.html#text-track-mode:
>>>
>>> "Disabled
>>>
>>> Indicates that the text track is not active. Other than for the
>>>purposes of
>>> exposing the track in the DOM, the user agent is ignoring the text
>>>track. No
>>> cues are active, no events are fired, and the user agent will not
>>>attempt to
>>> obtain the track's cues."
>>>
>>> And at
>>>http://dev.w3.org/html5/spec-preview/media-elements.html#text-track :
>>>
>>> "The text tracks of a media element are ready if all the text tracks
>>>whose
>>> mode was not in the disabled state when the element's resource selection
>>> algorithm last started now have a text track readiness state of loaded
>>>or
>>> failed to load."
>>>
>>> And at
>>>
>>>http://dev.w3.org/html5/spec-preview/media-elements.html#text-track-faile
>>>d-to-load
>>> :
>>>
>>> "Failed to load
>>>
>>> Indicates that the text track was enabled, but when the user agent
>>>attempted
>>> to obtain it, this failed in some way (e.g. URL could not be resolved,
>>> network error, unknown text track format). Some or all of the cues are
>>> likely missing and will not be obtained."
>>>
>>> Taken together these suggest to me that it's legitimate to create a text
>>> track and set it deliberately to mode="disabled" without loading cues,
>>>or to
>>> set it to, say, "showing" and proceed as though it is "ready" even
>>>though
>>> its readiness state is "failed to load", specifically in this case
>>>because
>>> the text track format is unknown. That at least provides a mechanism to
>>> control a media object that can dereference the text track object into
>>> something concrete in the media that it can present, which is what's
>>>needed
>>> here.
>>
>>Thanks for walking through this. You are explaining my point very
>>well, but you need to keep reading. The HTML spec says:
>>
>>"Whenever a text track's text track readiness state changes to either
>>loaded or failed to load, the user agent must remove it from any list
>>of pending text tracks that it is in."
>>
>>Thus, a track that has 'failed to load' is one that is ignored by the
>>browser and cannot display any cues.
>>
>>But that's not even what is going to happen for an in-band track where
>>the UA renders all the cues, but doesn't expose them as TextTrackCue
>>objects. Here's what happens there: the track will be 'disabled'.
>>Then, when it is selected, the UA will go into 'loaded' state once it
>>has parsed all the data and loaded into internal memory. It won't
>>reach 'failed to load' because the data was able to be obtained and
>>loaded with no fatal errors. However, since it doesn't expose cues,
>>the @cue TextTrackCueList in the TextTrack object of the video element
>>will have 0 cues. Thus, if the JavaScript developer checks on how many
>>cues are being rendered and at what times, they will see "0" and have
>>to assume that the browser has failed to parse any cues. The only
>>reasonable conclusion for the JS developer is to assume that the
>>loading of all cues failed and thus the track is not usable.
>
> That's an unreasonable assumption since if that were the case then the
> state should be 'failed to load'.

Well, the alternative assumption is that no cues were provided.
Neither of these imply, though, that the track 'failed to load'. Even
if some or all of the cues failed to be parsed, the track itself still
loaded fine.


> If it's not clear already then we should
> make it so, i.e. that the assumption is that the cues were parsed but
> there are no cues exposed, either because the track actually contained no
> cues or because the cues that were present were not exposed.

I don't see that being the necessary consequence. As I said: I don't
think this forum is the right one to make that decision, since it
requires changes to the HTML spec.


>>>>It's likely better exposed add a video track with burnt-in captions. I'd
>>>> recommend that's how it would be shown in the track list. When
>>>>activated,
>>>> both the default video track and the captions track would then be
>>>>rendered.
>>>
>>> This pushes the interface complexity somewhere else, but not somewhere
>>> helpful! I'd argue that the spec should get as close as possible to
>>>matching
>>> the media element model and using text tracks for this purpose is better
>>> than not doing so.
>>
>>Why is it not helpful? From the JS and user's point of view, that's
>>exactly what such a track is: a video track with burnt in captions.
>>Since it's now exposed in the list of video tracks, it can be selected
>>and activated. That's all that's required for such a track. That's as
>>useful as it gets, isn't it?
>
> See the comments others have made (including you) later in the thread
> about the relationship between video and audio tracks and text tracks.

So, you agree that it's useful as I proposed it. You're just concerned
about the relationships between the tracks and that indeed needs to be
addressed.


>>>By the way, I agree that exposing data provides interesting opportunities
>>> for developers, where possible. At least creating the text tracks
>>>provides
>>> the location for where such data might go, in case an implementation
>>>wants
>>> to put it somewhere; hiding the tracks away behind a 'burnt in video'
>>>would
>>> effectively block that.
>>
>>What do you mean by "where such data might go"? If the UA renders the
>>data, it can only render it within the video viewport, so for all
>>intents and purposes, it is video data.
>
> I mean 'in the text track cue list', if not in a different subclass of
> TextTrack that offers some other data structure. I wouldn't assume that UA
> rendering can only result in pixels being drawn in the video viewport: for
> example there could be connections to other display or rendering devices.

Since text tracks are part of the video element, text track data's
rendering is restricted to the video viewport's dimensions. If you
want them rendered elsewhere, you need to extend the HTML
specification for that. The only other way to do it is with JavaScript
and for that you need to expose the content of the text track cues.

Cheers,
Silvia.
Received on Tuesday, 30 September 2014 10:49:14 UTC