Re: [minutes] 20090821 Timed Text teleconference from Silvia Pfeiffer on 2009-08-26 (public-tt@w3.org from August 2009)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Wed, 26 Aug 2009 11:06:10 +1000
To: Sean Hayes <Sean.Hayes@microsoft.com>
Cc: Philippe Le Hegaret <plh@w3.org>, "public-tt@w3.org" <public-tt@w3.org>
Message-ID: <2c0e02830908251806n28ad2381l8f79df97b071dc0e@mail.gmail.com>
Hi Sean,

On Tue, Aug 25, 2009 at 8:48 PM, Sean Hayes <Sean.Hayes@microsoft.com>wrote:

> I understand the idea that HTML considers the resource monolitihic, and
> that it is up to the user agent to gate access to the resources in it, this
> is fine in some situations; but late binding of such information is an
> important model too. HTML also defines a case where the author creates the
> controls, they can't have it both ways, if they want the author to be able
> to build the UI then they will need a general purpose media API.


I am a bit torn on this subject. I do believe a composition language for
media presentations would be nice to have. I think it may not necessarily
need to be part of HTML. If the <object> or <embed> tag (or even <audio> and
<video> where it's clear that the result will be a audio or video resource)
would be able to link to a smil file, or a file of what your example below
states, or even something similar to what we experimented with for Ogg which
was called ROE (see http://wiki.xiph.org/ROE), that would be awesome.

I think at this stage of the development of HTML, it's a bit premature to be
able to make a decision as to what would be the best way to integrate this.
But doing a demo would be totally appropriate. I think the HTML WG would be
rather keen to see this.

<..>

Honestly I believe HTML would be better to just go back to an <object> and
> <embed> model, and leave all the complexity to the user agent and plugins,
> however they seem set to try and define a media handling subsystem; which is
> going to entail complexity.


I think the time is right to include audio and video as native resource
types just like there is image. And that was all that this version of HTML
set out to do. Now, this opens up all kinds of possibilities, which includes
potential for dynamic composition. But the Web world is not ready for this
yet - it needs a couple of years of experience with the <video> element, I
think, before there will be a recognized need for dynamic media composition.
Until then, it will be done in javascript if somebody has a need. And that's
not a bad thing.

I would not necessarily want HTML to start including the time container
> elements, although there is a historical precedent for that with HTML+TIME.
> But if they are going to allow alternates due to codec and media attributes
> and media queries on player capability this could all get very complicated
> very quickly (at which point a single pointer to a dedicated media setup
> file starts looks like a good option to me).


Yes, I agree, the trick is to keep it simple.


> In language design I'm generally in favor of a few general tools that work
> in as many places as possible, and so for options I think something along
> the lines of a general SVG <switch> statement should be used throughout
> HTML, but could be special purposed for this context.
>
> In terms of a concrete example I'd see it looking something like:
>
> media:    (video | audio | text | switch)*    # plus possible other types,
> like animation etc.
>
> Each of the sub elements of media (a nominal par element) is deemed to be a
> track (in the MPEG sense) played in parallel (when selected), and where one
> of these can be defined as the syncMaster. The switch element would allow
> the definition of alternates for each track based on the combination of
> media queries, codec and mime types and so on of its children elements, the
> elements in the switch could inherit attributes from the switch to save some
> duplication.  Script can change the selected attribute (or this could be a
> CSS attribute) in the DOM to turn on and off the various tracks, and in
> theory a pseudo DOM could be generated for an MPEG or other container to
> look much the same from the script point of view.
>
> e.g.
> <media >
>        <switch id="mainvideo" selected="on" syncMaster="true" >
>              <video media="..."  src="foo.big.vid"... />
>              <video media="..."  src="foo.small.vid"  ... />
>        </switch>
>        <audio id="mainAudio" src="bar.aud"  selected="on" ... />
>        <audio id="audioDescription"  src="bar.sap.aud"  selected="off"...
> />
>        <switch id="captions" selected="off" >
>                <text media="..."   src="baz.ttxt" ... />
>                <text   media="..."  src="baz.srt" ... />
>        </switch>
>        <switch id="subtitles" selected="off" >
>                <text media="..."   src="english.ttxt" ... />
>                <text media="..."   src="english.srt" ... />
>                <text media="..."   src="french.ttxt" ... />
>                <text media="..."   src="french.srt" ... />
>        </switch>
> </media>
>
> As you can see though, this is likely to get out of hand, but each of these
> features is critical to someone; HTML is going to have a hard time coming to
> consensus on a simple subset. I expect to re-raise this discussion in the
> pf/html accessibility task force, but I'm happy to continue the debate here
> too.


I like the example - it's also very similar to what we came up with for ROE.



Regards,
Silvia.


 Sean
> -----Original Message-----
> From: Silvia Pfeiffer [mailto:silviapfeiffer1@gmail.com]
> Sent: 25 August 2009 12:49 AM
> To: Sean Hayes
> Cc: Philippe Le Hegaret; public-tt@w3.org
> Subject: Re: [minutes] 20090821 Timed Text teleconference
>
> On Tue, Aug 25, 2009 at 8:22 AM, Sean Hayes<Sean.Hayes@microsoft.com>
> wrote:
> > Yes I've looked at this, we did discuss it briefly; although the notes
> don't reflect it.
> >
> > I think it's a reasonable basic approach, but I'd like to see it be a
> little more general, allowing for the possibility of audio description
> tracks using the <audio> tag for <video> and even sign translation tracks,
> also I don't think there is a need to invent much, SMIL <audio> and <video>
> are in fact synonyms of the more general media <ref>, as are <text> and
> <animation> and others. In my opinion HTML could adopt this same notion. I
> don't think HTML should be looking at importing a lot of SMIL (although I
> guess they could reference a SMIL file with a media tag), but by adopting
> the basic media module <ref> and its synonyms, and allowing the sync*
> attributes, this could all be achieved.
>
> Hi Sean,
>
> As it stands, the HTML5 video element expects audio description tracks and
> sign translation tracks to come through the binary resource as additional
> audio or video tracks of the video file. Thus, they cannot be created
> through HTML markup. However, I think what is missing is the notion that a
> Web browser should parse such tracks and add them to a menu is created for
> the video, so users can activate them.
>
> HTML does not like dealing with abstract resources (this was what the
> <embed> and <object> elements did), but rather prefers dealing with concrete
> data for which specific attributes and API calls can be made available. For
> example, width and height attributes don't make much sense on an audio
> element, while they are very important for video.
> Thus, I don't think a general <ref> element makes much sense, since it
> would be a step back towards what <embed> and <object> used to be.
>
> Also, I am curious about the sync attributes you are referring to. I found
> endsync
> http://www.w3.org/TR/2008/REC-SMIL3-20081201/smil-timing.html#adef-endsync
> , which requires par and sec elements and thus creates a whole media
> composition language. Is this what you are after? Can you give an example of
> what you would like to see available in HTML5 in that respect?
>
> Thanks,
> Silvia.
>
>
Received on Wednesday, 26 August 2009 01:07:14 UTC