- From: Philip Jägenstedt <philipj@opera.com>
- Date: Wed, 05 May 2010 12:01:53 +0800
- To: "Dick Bulterman" <Dick.Bulterman@cwi.nl>, "HTML Accessibility Task Force" <public-html-a11y@w3.org>, "Silvia Pfeiffer" <silviapfeiffer1@gmail.com>
On Wed, 05 May 2010 06:44:19 +0800, Silvia Pfeiffer <silviapfeiffer1@gmail.com> wrote: > Dear Dick, all, > > In light of yesterday's developments with the HTML5 draft [1][2], all > the proposals that were made in this group have now somewhat > contributed to progress, but are superseded, so a new discussion needs > to be had. > > [1] > http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#the-track-element > [2] > http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#timed-tracks > > > However, I would hate to give a perception that Dick's technical > concerns are not being considered by this group, so I've decided to > formulate a reply. > > > On Tue, May 4, 2010 at 8:12 AM, Dick Bulterman <Dick.Bulterman@cwi.nl> > wrote: >> >> 1. The name 'track' for identifying a text object within a video >> element is >> misleading. It may lead people to think that any arbitrary data type >> could >> be specified (such as an audio track, an animation track or even a >> secondary >> video track). Since this proposal is purportedly intended to allow the >> activation of external text tracks only, a more reasonable name would be >> 'textTrack' or 'textStream'. > > This was discussed before, see e.g. > [3] > http://lists.w3.org/Archives/Public/public-html-a11y/2010Mar/0163.html > [4] > http://lists.w3.org/Archives/Public/public-html-a11y/2010Apr/0175.html > > In particular in [4] it is stated that the <track> element is > explicitly designed by the group (see the thread at [5]) to allow > using it for externally associated, dependent audio or video tracks > (in particular with an audio description or a sign language track). > > [5] > http://lists.w3.org/Archives/Public/public-html-a11y/2010Feb/0226.html > > >> 2. The name 'trackGroup' is equally misleading. In other languages, a >> 'group' element is used as to aggregate child elements; here, it is >> used to >> select child elements. As with 'track' it also gives the impression >> that a >> choice can be made within a select of general tracks, which is not >> true. A >> name such as 'textSelect' or 'captionSelect' might be more useful. (The >> 'switch' would only be appropriate if all semantics of the SMIL switch >> were >> adapted.) > > This was discussed before, e.g. > [6] > http://lists.w3.org/Archives/Public/public-html-a11y/2010Apr/0085.html > [7] > http://lists.w3.org/Archives/Public/public-html-a11y/2010Apr/0086.html > [8] > http://lists.w3.org/Archives/Public/public-html-a11y/2010Mar/0133.html > > In particular in [8] it is stated that <trackgroup> is chosen because > it is already in use in MPEG files for the identical purpose and thus > it made sense to reuse that term, since industry would already > understand it. Also, it is discussed that the SMIL <switch> element > does not meet all the requirements for this element. > > >> 3. The semantics defined by Silvia for managing selection based on >> lexical >> ordering is not clear to me. It seems that the children are first >> processed >> to resolve 'source' elements, then 'track' elements (and then >> trackGroups)? >> What happens when things appear out of order (such as having 'source' >> elements interspersed among track elements? > > This has indeed not been raised before, but it is actually not a > problem, since <source> and <track> do not interfere with each other. > The <source> elements are evaluated according to the current > specification of HTML5 [9] ignoring any <track> elements. Similarly, > our proposal is to evaluate the <track> elements also in tree order > [10], which would not interfere with <source>. A <trackgroup> element > simply functions like another <track> element, since only one of the > <track>s inside it would ever be active. > > [9] > http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#concept-media-load-algorithm > [10] > http://lists.w3.org/Archives/Public/public-html-a11y/2010Mar/0133.html > > >> 4. The assumption that there are no synchronization conflicts between a >> video master and the text children strikes me as overly simplistic: it >> is >> not practical to simply buffer a full set of captions in all cases. >> Consider >> mobile phone use: if a given video had captions in French/Dutch/English, >> would all three sets be downloaded before the first video frame is >> displayed? What happens if someone turns on captions while the video is >> active: does the video pause while the captions are loaded? It the SRT >> files >> are large, significant data charges could be incurred, even if the video >> itself were not played. > > Yes, these are good concerns to discuss. We could have had this > discussion here and probably still should. Ian had this discussion > also with me and several others when putting his requirements document > together [11]. > > Currently the state of thinking as expressed in [11] is that captions > that are active (i.e. have a @active attribute) will be loaded with > the video and <video> will not go into the METADATA_LOADED state > unless all of this data has been received. Also, if somebody turns on > captions during playback, the video will pause until the captions are > loaded. > > I'm not sure I personally agree with the latter - I would prefer "best > effort" rather than pausing, but we should discuss this. > > [11] http://wiki.whatwg.org/wiki/Timed_tracks > > >> I continue to be concerned that overloading text selection and temporal >> alignment within the <video>/<audio> elements is, architecturally, a bad >> idea. By adding explicit temporal structuring (as is already done in >> HTML+Time and in scores of playlist formats), the syntax for selecting >> and >> integrating text captions would not have to be a special-purpose hack. >> An >> example (based on HTML+Time syntax available within IE for over 10 >> years) >> is: >> <div timeContainer="par" controls ... > >> <video ...> >> <source .../> >> ... >> <source .../> >> </video> >> <switch systemCaptions="true"> >> <textstream src="xxx" systemLanguage="nl" ... /> >> <textstream src="yyy" systemLanguage="fr" ... /> >> <textstream src="zzz" ... /> <!-- default --> >> </switch> >> </div> >> >> There is nothing complex about these solutions -- it simply means making >> temporal synchronization explicit. It allows easy extensibility for >> including image slideshows as alternatives to video, or for providing >> different choice in the case of particular screen sizes or connection >> speeds. Naturally, this is not an accessibility-only issue, but history >> has >> shown that the community of users with special needs are best served >> when a >> consistent framework exists for managing multiple content alternatives. > > This changes the meaning of the <div> element in HTML and thus has > wide-ranging implications. It will not be possible to solve it in this > way. > > >> I first wrote a position paper on this (with concrete suggestions) four >> years ago and submitted it to the HTML lists, but it never got on the >> HTML5 >> agenda. Since then, I've been told several times that there is no time >> to >> come up with an appropriate solution for developing a compressive model >> for >> inter-object synchronization before HTML5 goes to last call. (I've been >> hearing this for about 2 years.) Yet, there is time to come up with >> non-extensible, non-scalable solutions. There is even time to develop >> yet >> another timed text model. In this light, I think that it is >> indefensible to >> ignore structured time within HTML5. > > Can you prove that what we are pursuing is non-extensible and > non-scalable? I have not seen a single use case that would be > inhibited by the current approach and would be curious to see and > address it. > > On the contrary, I believe that inter-object synchronization, i.e. the > creation of multimedia experiences that include multiple timelines, > images, and user interaction as SMIL does, is at a higher level than > what we are currently concerned with. We are focused solely on making > the existing <audio> and <video> elements accessible. Once this is > solved, it is well possible to introduce a new element that allows the > composition of <audio>, <video>, <img> and other elements into a > multimedia experience of SMIL dimensions. It is not clear to me > whether Canvas might already solve this need, or whether indeed a > SMIL-type element is necessary. This is the larger picture that I keep > referring to and that would be very interesting to analyse with you. > But I cannot see that what we are currently pursuing would interfere > or prohibit the solution of this larger picture. If you have an > example, please do contribute. > > > Best Regards, > Silvia. > For the record, I agree with Silvia's assessment of the above issues. I have taken part in the mailing list discussions and haven't seen anything but dispassionate technical replies to Dick's concerns, from Silvia or anyone else. I hope Dick might reconsider and continue contributing use cases, suggestions and criticism on the existing and emerging specs. -- Philip Jägenstedt Core Developer Opera Software
Received on Wednesday, 5 May 2010 04:02:36 UTC