Re: Survey ready on Media Text Associations proposal

Hi Dick,

Thanks for checking back with the SMIL specification on this proposal.
It is good to do this now for two reasons: firstly to check if SMIL
already has a construct that satisfies the needs, and secondly to see
if SMIL has some functionality that we have missed for the track
proposal.

So, let me give this a thorough analysis.

On Fri, Mar 5, 2010 at 11:10 PM, Dick Bulterman <Dick.Bulterman@cwi.nl> wrote:
> On the track proposal, just to make sure I'm not missing something:
> Is there an implied preference order in the statements:
>>
>> <trackgroup media="accessibility(captions:yes") >
>>>
>>>  <track src="en.srt" lang="en" enabled >
>>>  <track src="fr.srt" lang="fr" >
>>>  <track src="de.srt" lang="de" >
>>> </trackgroup>
>
> (In other words, the implied preference order is English, French, German.)
>
> Compare this to the SMIL way of doing the same thing:
>  <par>
>    <video src="..." />
>    <switch systemCaptions="on" allowReorder="yes">
>      <textstream src="en.xxx" systemLanguage="en" />
>      <textstream src="fr.xxx" systemLanguage="de" />
>      <textstream src="de.xxx" systemLanguage="de" />
>    </switch>
>  </par>
> The default behavior is that the first candidate matching a set language
> preference is used. The 'allowReorder' attribute explicitly allows a user
> agent the reorder the order of options if the user (via the UA) has
> determined that he/she prefers German over French.


Yes, the DOM in HTML has indeed a given order and that is tree order.
This is important for allowing scripting languages and things such as
xpath to be able to work on a document in a predictable way.
Reordering in HTML is not done using an attribute. If there is a need
for reordering, one uses JavaScript. Thus, the @allowReorder attribute
is not required for HTML.


> Note also that in this example, the entire <switch> is only evaluated if the
> user (agent) has determined that captions are required.

In the given proposal, the track elements will always be in the DOM
and will always be parsed and evaluated by the UA. However, whether
the external resource is loaded or not is described by the resource
selection algorithm
(http://www.w3.org/WAI/PF/HTML/wiki/Media_TextAssociations#Resource_selection_algorithm).

I do not think this is a fundamental difference, though - if we were
to adopt the SMIL markup (and I am not suggesting we should - I am
just hypothetically analysing this situation), it would need to work
this way because that's how HTML works.


> Note finally that if
> the user has the language preference Dutch, no captions will play (since he
> presumably can't understand them anyway). Having a final statement in a
> <switch> without a predicate determines a result that will allows play if no
> earlier option (reodered or not) do not provide a preference match.

In our case, this is something that is up to the Web author and the
UA's decisions for presentation.

If the Web author decides to put a default "enable" on one of the
tracks, then that track will be presented to anyone unless their
browser preference settings indicate that they want the German or the
French track. Also, there will be a menu through which users can
activate other tracks.

If the Web author "enables" none of the tracks, no subtitles will be
shown unless you have a browser preference setting that indicates that
you always want subtitles on when they are available in your language.

Those browser preference settings, incidentally, are just a
recommendations for the browser vendors on how to implement support
for these elements. What browsers actually do is not something that is
defined in HTML.


> Are the semantics of <trackgroup> similar? (If so, why invent something new;
> if not, are at least the SMIL semantics supported?)

This is a much more general question than the specific questions
above. So, let me reply to it in detail.

If we were to introduce the SMIL approach, we would require the
introduction of the following elements:
* par
* switch
* textstream

instead of introducting:
* track
* trackgroup


Let's start with the "easy" comparison: track vs textstream.

We had lengthy discussions about whether we should add an element that
explicitly only links to external text streams or is able to also be
applied to other types of content, such as external audio or video
tracks. The consensus was that making it a general element would be a
lot more appropriate. The @type attribute would in any case hint at
what kind of resource is being linked to, so it doesn't need to be
explicit in the element name. Thus, track is a lot more generic than
textstream and it's not appropriate to adopt textstream for this use
case.

Now the more complicated comparison: trackgroup vs switch

The switch element is indeed used for a similar purpose as the
trackgroup element here: allowing only one out of the list of elements
inside it to be active. The switch element has an @id and a @title
attribute. The trackgroup element has several more attributes that
signify is purpose for grouping tracks that have something in common.
It thus inherits most of the attributes from the track element. To
give switch the same purpose, it would have needed to be extended from
the SMIL specification. Further: with the chosen name we reused a name
that is already being used in MPEG for signifying alternate tracks. I
don't think renaming trackgroup to switch will earn us much, but it
would of course be an option.

Finally the extra element: par

par is required in SMIL because SMIL is good for compositing media
resources together. Thus, the video element and the textstream
elements are composited together as parallel resources. seq and switch
contribute to that functionality, too and really important for
something as flexible as SMIL.

In HTML5, a media element is not regarded as a composited resource.
There is a main resource and it is the important bit - everything else
is just additional information on top of that. Or speaking concretely
in our example: the external tracks make no sense without the video
element. Therefore, there is a dependency between the track elements
and the video element, which is expressed by having the video element
be the parent element. This makes total sense in HTML5, but no sense
in SMIL at all. This is the reason why par is not necessary in HTML5:
there are no parallel resources that enjoy equal rights. There is a
main resource and it dominates all other linked resources, all
external text associations and all other associated media. Its
duration defines the timeline, defines the duration of the media
element, defines the playback position, defines events etc. It's the
master. There is no need for "par" in HTML, since the media element in
itself is already the time master that "par" would be.


So, in summary as a reply to your question: No, the semantics of {par,
switch, textstream} are not identical to the semantics of {track,
trackgroup}. A renaming of trackgroup to switch would be possible, but
it would have different attributes than the SMIL switch element and
thus would not be semantically equal either. There are reasons for the
design decisions for {track, trackgroup} and why {par, switch,
textstream} didn't satisfy the requirements.


Best Regards,
Silvia.

Received on Saturday, 6 March 2010 00:55:09 UTC