Re: Video feedback from Silvia Pfeiffer on 2011-06-04 (www-archive@w3.org from June 2011)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Sat, 4 Jun 2011 11:40:39 +1000
To: www-archive@w3.org
Message-ID: <BANLkTinsz9cghNHUiHOLWUK0FD=Co70QSw@mail.gmail.com>
It's there now:
http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-June/031916.html
S.

On Fri, Jun 3, 2011 at 6:21 PM, Silvia Pfeiffer
<silviapfeiffer1@gmail.com> wrote:
> Seems this mail was not archived at
> http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-June/
> Thus forwarding it for archiving.
> Regards,
> Silvia.
>
> On Fri, Jun 3, 2011 at 9:28 AM, Ian Hickson <ian@hixie.ch> wrote:
>>
>> (Note that I have tried to only reply to each suggestion once, so
>> subsequent requests for the same feature are not included below.)
>>
>> (I apologise for the somewhat disorganised state of this e-mail. I
>> normally try to group topics together, but the threads I'm responding to
>> here jumped back and forth across different issues quite haphazardly and
>> trying to put related things together broke some of the flow and context
>> of the discussions, so I opted in several places to leave the context as
>> it was originally presented, and just jump back and forth amongst the
>> topics raised. Hopefully it's not too confusing.)
>>
>> On Thu, 9 Dec 2010, Silvia Pfeiffer wrote:
>>> >> > >
>>> >> > > Sure, but this is only a snippet of an actual application. If,
>>> >> > > e.g., you want to step through a list of videos (maybe an
>>> >> > > automated playlist) using script and you need to provide at least
>>> >> > > two different formats with <source>, you'd want to run this
>>> >> > > algorithm frequently.
>>> >> >
>>> >> > Just have a bunch of <video>s in the markup, and when one ends,
>>> >> > hide it and show the next one. Don't start dynamically manipulating
>>> >> > <source> elements, that's just asking for pain.
>>> >> >
>>> >> > If you really must do it all using script, just use canPlayType and
>>> >> > the <video src=""> attribute, don't mess around with <source>.
>>> >>
>>> >> Thanks for adding that advice. I think it's important to point that
>>> >> out.
>>> >
>>> > I can add it to the spec too if you think that would help. Where would
>>> > a good place for it be?
>>>
>>> There is a note in the <source> element section that reads as follows:
>>> "Dynamically modifying a source element and its attribute when the
>>> element is already inserted in a video or audio element will have no
>>> effect. To change what is playing, either just use the src attribute on
>>> the media element directly, or call the load() method on the media
>>> element after manipulating the source elements."
>>>
>>> Maybe you can add some advice there to use canPlayType to identify what
>>> type of resource to add in the @src attribute on the media element.
>>> Also, you should remove the last half of the second sentence in this
>>> note if that is not something we'd like to encourage.
>>
>> Done.
>>
>>
>> On Wed, 8 Dec 2010, Kevin Marks wrote:
>>>
>>> One case where posters come back after playback is complete is when
>>> there are multiple videos on the page, and only one has playback focus
>>> at a time, such as a page of preview movies for longer ones to purchase.
>>>
>>> In that case, showing the poster again on blur makes sense conceptually.
>>>
>>> It seems that getting back into the pre-playback state, showing the
>>> poster again would make sense in this context.
>>>
>>> That would imply adding an unload() method that reverted to that state,
>>> and could be used to make any cached media data purgeable in favour of
>>> another video that is subsequently loaded.
>>
>> You don't need unload(), you can just use load(). It essentially resets
>> the media element.
>>
>> It's not hugely efficient, but if we find people are trying to do this a
>> lot, then we can add a more efficent variant that just resets the poster
>> frame state, I guess. (I'd probably call it stop(), though, not unload().)
>>
>>
>> On Thu, 9 Dec 2010, David Singer wrote:
>>>
>>> I think if you want that effect, you flip what's visible in an area of
>>> the page between a playing video, and an image.  Relying on the poster
>>> is not effective, IMHO.
>>
>> I don't know, I think it would make semantic sense to have all the videos
>> be <video> elements if they're actually going to be played right there.
>>
>>
>> On Thu, 9 Dec 2010, Kevin Marks wrote:
>>>
>>> I know it's not effective at the moment; it is a common use case.
>>> QuickTime had the 'badge' ux for years that hardly anyone took advantage
>>> of:
>>>
>>> http://www.mactech.com/articles/mactech/Vol.16/16.02/Feb00QTToolkit/index.html
>>>
>>> What we're seeing on the web is a converged implementation of the
>>> YouTube-like overlaid grey play button, but this is effectively
>>> reimplemented independently by each video site that enables embedding.
>>>
>>> As we see HTML used declaratively for long-form works like ebooks on
>>> lower performance devices, having embedded video that doesn't
>>> cumulatively absorb all the memory available is going to be like the old
>>> CD-ROM use cases the QT Badge was meant for.
>>
>> This seems like a presentational issue, for which CSS would be better
>> positioned to provide a solution.
>>
>>
>> On Thu, 9 Dec 2010, Boris Zbarsky wrote:
>>> On 12/8/10 8:19 PM, Ian Hickson wrote:
>>> > Boris wrote:
>>> > > You can't sniff in a toplevel browser window.  Not the same way that
>>> > > people are sniffing in <video>. It would break the web.
>>> >
>>> > How so?
>>>
>>> People actually rely on the not-sniffing behavior of UAs in actual
>>> browser windows in some cases.  For example, application/octet-stream at
>>> toplevel is somewhat commonly used to force downloads without a
>>> corresponding Content-Disposition header (poor practice, but support for
>>> Content-Disposition hasn't been historically great either).
>>>
>>> > (Note that the spec as it stands takes a compromise position: the
>>> > content is only accepted if the Content-Type and type="" values are
>>> > supported types (if present) and the content sniffs as a supported
>>> > type, but nothing in the spec checks that all three values are the
>>> > same.)
>>>
>>> Ah, I see.  So similar to the way <img> is handled...
>>>
>>> I can't quite decide whether this is the best of both worlds, or the
>>> worst. ;)
>>
>> Yeah, I hear ya.
>>
>>
>>> It certainly makes it simpler to implement video by delegating to
>>> QuickTime or the like, though I suspect such an implementation would
>>> also end up sniffing types the UA doesn't necessarily claim to
>>> support.... so maybe it's not simpler after all.
>>
>> Indeed.
>>
>> At this point I'm basically just waiting to see what implementations end
>> up doing. When I tried moving us more towards sniffing, there was
>> pushback; when I tried moving us more towards honouring types, there was
>> equal and opposite pushback. So at this point, I'm letting the market
>> decide it. :-)
>>
>>
>> On Thu, 9 Dec 2010, Simon Pieters wrote:
>>> On Thu, 09 Dec 2010 02:58:12 +0100, Ian Hickson <ian@hixie.ch> wrote:
>>> > On Wed, 1 Sep 2010, Simon Pieters wrote:
>>> > >
>>> > > I think it might be good to run the media element load algorithm
>>> > > when setting or changing src on <source> (that has a media element
>>> > > as its parent), but not type and media (what's the use case for type
>>> > > and media?). However it would fire an 'emptied' event for each
>>> > > <source> that changed, which is kind of undesirable. Maybe the media
>>> > > element load algorithm should only be invoked if src is set or
>>> > > changed on a <source> that has no previous sibling <source>
>>> > > elements?
>>> >
>>> > What's the use case? Just set .src before you insert the element.
>>>
>>> The use case under discussion is changing to another video. So the
>>> element is already inserted and already has src.
>>>
>>> Something like:
>>>
>>> <video controls autoplay>
>>> <source src=video1.webm type=video/webm>
>>> <source src=video1.mp4 type=video/mp4>
>>> </video>
>>> <script>
>>> function loadVideo(src) {
>>>  var video = document.getElementsByTagName('video')[0];
>>>  sources = video.getElementsByTagName('source');
>>>  sources[0].src = src + '.webm';
>>>  sources[1].src = src + '.mp4';
>>> }
>>> </script>
>>> <input type="button" value="See video 1" onclick="loadVideo('video1')">
>>> <input type="button" value="See video 2" onclick="loadVideo('video2')">
>>> <input type="button" value="See video 3" onclick="loadVideo('video3')">
>>
>> Well if you _really_ want to do that, just call video.load() at the end of
>> loadVideo(). But really, you're better off poking around with
>> canPlayType() and setting video.src directly instead of using <source>
>> for these dynamic cases.
>>
>>
>> On Thu, 9 Dec 2010, Kevin Carle wrote something more or less like:
>>>
>>> function loadVideo(src) {
>>>  var video = document.getElementsByTagName('video')[0];
>>>  if (video.canPlayType("video/webm") != "")
>>>    video.src = src + '.webm';
>>>  else
>>>    video.src = src + '.mp4';
>>> }
>>
>> Yeah.
>>
>> And hopefully this will become moot when there's a common video format,
>> anyway.
>>
>>
>> On Fri, 10 Dec 2010, Simon Pieters wrote:
>>>
>>> You'd need to remove the <source> elements to keep the document valid.
>>
>> You don't need them in the first place if you're doing things by script,
>> as far as I can tell.
>>
>>
>>> The author might want to have more than two <source>s, maybe with
>>> media="", onerror="" etc. Then it becomes simpler to rely on the
>>> resource selection algorithm.
>>
>> It's hard to comment without seeing a concrete use case.
>>
>>
>> On Tue, 14 Dec 2010, Philip J盲genstedt wrote:
>>> On Wed, 24 Nov 2010 17:11:02 +0100, Eric Winkelman <E.Winkelman@cablelabs.com>
>>> wrote:
>>> >
>>> > I'm investigating how TimedTracks can be used for in-band-data-tracks
>>> > within MPEG transport streams (used for cable television).
>>> >
>>> > In this format, the number and types of in-band-data-tracks can change
>>> > over time.  So, for example, when the programming switches from a
>>> > football game to a movie, an alternate language track may appear that
>>> > wasn't there before. Later, when the programming changes again, that
>>> > language track may be removed.
>>> >
>>> > It's not clear to me how these changes are exposed by the proposed
>>> > Media Element events.
>>>
>>> The thinking is that you switch between different streams by setting the
>>> src="" attribute to point to another stream, in which case you'll get an
>>> emptied event along with another bunch of events. If you have a single
>>> source where audio/video/text streams appear and disappear, there's not
>>> really any way to handle it.
>>
>> As specified, there's no way for a media element's in-band text tracks to
>> change after the 'loadedmetadata' event has fired.
>>
>>
>>> > The "loadedmetadata" event is used to indicate that the TimedTracks
>>> > are ready, but it appears that it is only fired before playback
>>> > begins.  Is this event fired again whenever a new track is discovered?
>>> > Is there another event that is intended for this situation?
>>> >
>>> > Similarly, is there an event that indicates when a track has been
>>> > removed? Or is this also handled by the "loadedmetadata" event
>>> > somehow?
>>>
>>> No, the loadedmetadata event is only fired once per resource, it's not
>>> the event you're looking for.
>>>
>>> As for actual solutions, I think that having loadedmetadata fire again
>>> if the number or type of streams change would make some sense.
>>
>> It would be helpful to know more about these cases where there are dynamic
>> changes to the audio, video, or text tracks. Does this really happen on
>> the Web? Do we need to handle it?
>>
>>
>> On Thu, 16 Dec 2010, Silvia Pfeiffer wrote:
>>>
>>> I do not know how technically the change of stream composition works in
>>> MPEG, but in Ogg we have to end a current stream and start a new one to
>>> switch compositions. This has been called "sequential multiplexing" or
>>> "chaining". In this case, stream setup information is repeated, which
>>> would probably lead to creating a new steam handler and possibly a new
>>> firing of "loadedmetadata". I am not sure how chaining is implemented in
>>> browsers.
>>
>> Per spec, chaining isn't currently supported. The closest thing I can find
>> in the spec to this situation is handling a non-fatal error, which causes
>> the unexpected content to be ignored.
>>
>>
>> On Fri, 17 Dec 2010, Eric Winkelman wrote:
>>>
>>> The short answer for changing stream composition is that there is a
>>> Program Map Table (PMT) that is repeated every 100 milliseconds and
>>> describes the content of the stream.  Depending on the programming, the
>>> stream's composition could change entering/exiting every advertisement.
>>
>> If this is something that browser vendors want to support, I can specify
>> how to handle it. Anyone?
>>
>>
>> On Sat, 18 Dec 2010, Robert O'Callahan wrote:
>>>
>>> http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#dom-media-duration says:
>>> [...]
>>>
>>> What if the duration is not currently known?
>>
>> The user agent must determine the duration of the media resource before
>> playing any part of the media data and before setting readyState to a
>> value equal to or greater than HAVE_METADATA, even if doing so requires
>> fetching multiple parts of the resource.
>>
>>
>>> I think in general it will be very difficult for a user-agent to know
>>> that a stream is unbounded. In Ogg or WebM a stream might not contain an
>>> explicit duration but still eventually end. Maybe it would make more
>>> sense for the last sentence to read "If the media resource is not known
>>> to be bounded, ..."
>>
>> Done.
>>
>>
>> On Sat, 18 Dec 2010, Philip J盲genstedt wrote:
>>>
>>> Agreed, this is how I've interpreted the spec already. If a server
>>> replies with 200 OK instead of 206 Partial Content and the duration
>>> isn't in the header of the resource, then the duration is reported to be
>>> Infinity. If the resource eventually ends another durationchange event
>>> is fired and the duration is reported to be the (now known) length of
>>> the resource.
>>
>> That's fine.
>>
>>
>> On Mon, 20 Dec 2010, Robert O'Callahan wrote:
>>>
>>> That sounds good to me. We'll probably do that. The spec will need to be
>>> changed though.
>>
>> I changed it as you suggest above.
>>
>>
>> On Fri, 31 Dec 2010, Bruce Lawson wrote:
>>> > On Fri, 5 Nov 2010, Bruce Lawson wrote:
>>> > >
>>> > > http://www.whatwg.org/specs/web-apps/current-work/complete/video.html#sourcing-in-band-timed-tracks
>>> > > says to create TimedTrack objects etc for in-band tracks which are
>>> > > then exposed in the API - so captions/subtitles etc that are
>>> > > contained in the media container file are exposed, as well as those
>>> > > tracks pointed to by the <track> element.
>>> > >
>>> > > But
>>> > > http://www.whatwg.org/specs/web-apps/current-work/complete/video.html#timed-track-api
>>> > > implies that the array is only of tracks in the track element:
>>> > >
>>> > > "media . tracks . length
>>> > >
>>> > > Returns the number of timed tracks associated with the media element
>>> > > (e.g. from track elements). This is the number of timed tracks in
>>> > > the media element's list of timed tracks."
>>> >
>>> > I don't understand why you interpret this as implying anything about
>>> > the track element. Are you interpreting "e.g." as "i.e."?
>>> >
>>> > > Suggestion: amend to say "Returns the number of timed tracks
>>> > > associated with the media element (e.g.  from track elements and any
>>> > > in-band track files inside the media container file)" or some such.
>>> >
>>> > I'd rather avoid talking about the in-band ones here, in part because
>>> > I think it's likely to confuse authors at least as much as help them,
>>> > and in part because the terminology around in-band timed tracks is a
>>> > little unclear to me and so I'd rather not talk about them in
>>> > informative text. :-)
>>> >
>>> > If you disagree, though, let me know. I can find a way to make it
>>> > work.
>>>
>>> I disagree, but not aggressively vehemently. My confusion was conflating
>>> "track elements" with the three instances of the phrase "timed tracks"
>>> in close proximity.
>>>
>>> I suggest that "Returns the number of timed tracks associated with the
>>> media element (i.e. from track elements and any packaged along with the
>>> media in its container file)" would be clearer and avoid use of the
>>> confusing phrase "in-band tracks".
>>
>> That's still confusing, IMHO. "Packaged" doesn't imply in-band; most
>> subtitle files are going to be "packaged" with the video even if they're
>> out of band.
>>
>> Also, your 'i.e.' here is wrong. There's at least one other source of
>> tracks: the ones added by the script.
>>
>> The non-normative text is intentionally not overly precise, because if it
>> was precise it would just be the same as the normative text and wouldn't
>> be any simpler, defeating its entire purpose.
>>
>>
>> On Mon, 3 Jan 2011, Philip J盲genstedt wrote:
>>> >
>>> > + I've added a magic string that is required on the format to make it
>>> >   recognisable in environments with no or unreliable type labeling.
>>>
>>> Is there a reason it's "WEBVTT FILE" instead of just "WEBVTT"? "FILE"
>>> seems redundant and like unnecessary typing to me.
>>
>> It seemed more likely that non-WebVTT files would start with a line that
>> said just "WEBVTT" than a line that said just "WEBVTT FILE". But I guess
>> "WEBVTT FILE FORMAT" is just as likely and it'll be caught.
>>
>> I've changed it to just "WEBVTT"; there may be existing implementations
>> that only accept "WEBVTT FILE" so for now I recommend that authors still
>> use the longer header.
>>
>>
>>> > On Wed, 8 Sep 2010, Philip J盲genstedt wrote:
>>> > >
>>> > > In the discussion on public-html-a11y <trackgroup> was suggested to
>>> > > group together mutually exclusive tracks, so that enabling one
>>> > > automatically disables the others in the same trackgroup.
>>> > >
>>> > > I guess it's up to the UA how to enable and disable <track>s now,
>>> > > but the only option is making them all mutually exclusive (as
>>> > > existing players do) or a weird kind of context menu where it's
>>> > > possible to enable and disable tracks completely independently.
>>> > > Neither options is great, but as a user I would almost certainly
>>> > > prefer all tracks being mutually exclusive and requiring scripts to
>>> > > enable several at once.
>>> >
>>> > It's not clear to me what the use case is for having multiple groups
>>> > of mutually exclusive tracks.
>>> >
>>> > The intent of the spec as written was that a browser would by default
>>> > just have a list of all the subtitle and caption tracks (the latter
>>> > with suitable icons next to them, e.g. the [CC] icon in US locales),
>>> > and the user would pick one (or none) from the list. One could easily
>>> > imagine a UA allowing the user to enable multiple tracks by having the
>>> > user ctrl-click a menu item, though, or some similar solution, much
>>> > like with the commonly seen select box UI.
>>>
>>> In the vast majority of cases, all tracks are intended to be mutually
>>> exclusive, such as English+English HoH or subtitles in different
>>> languages. No media player UI (hardware or software) that I have ever
>>> used allows enabling multiple tracks at once. Without any kind of hint
>>> about which tracks make sense to enable together, I can't see desktop
>>> Opera allowing multiple tracks (of the same kind) to be enabled via the
>>> main UI.
>>
>> Personally I think it's quite reasonable to want to see two languages at
>> once, or even two forms of the same language at once, especially for,
>> e.g., reviewing subtitles. But I don't think it would be a bad thing if
>> some browsers didn't expose that in the UI; that's something that could
>> be left to bookmarklets, for example.
>>
>>
>>> Using this syntax, I would expect some confusion when you omit the closing
>>> </v>, when it's *not* a cue spoken by two voices at the same time, such as:
>>>
>>> <v Jim>- Boo!
>>> <v Bob>- Gah!
>>>
>>> Gah! is spoken by both Jim and Bob, but that was likely not intended. If
>>> this causes confusion, we should make validators warn about multiple
>>> voices with with no closing </v>.
>>
>> No need to just warn, the spec says the above is outright invalid, so
>> they would raise an error.
>>
>>
>>> > > For captions and subtitles it's less common, but rendering it
>>> > > underneath the video rather than on top of it is not uncommon, e.g.
>>> > > http://nihseniorhealth.gov/video/promo_qt300.html or
>>> >
>>> > Conceptually, that's in the video area, it's just that the video isn't
>>> > centered vertically. I suppose we could allow UAs to do that pretty
>>> > easily, if it's commonly desired.
>>>
>>> It's already possible to align the video to the top of its content box
>>> using <http://dev.w3.org/csswg/css3-images/#object-position>:
>>>
>>> video { object-position: center top }
>>>
>>> (This is already supported in Opera, but prefixed: -o-object-position)
>>
>> Sounds good.
>>
>>
>>> Note that in Sweden captioning for the HoH is delivered via the teletext
>>> system, which would allow ASCII-art to be displayed. Still, I've never
>>> seen it. The only case of graphics being used in "subtitles" I can
>>> remember ever seeing is the DVD of
>>> <http://en.wikipedia.org/wiki/Cat_Soup>, where the subtitle system is
>>> (ab)used to overlay some graphics.
>>
>> Yeah, I'm not at all concerned about not supporting graphics in subtitles.
>> It's nowhere near the 80% bar.
>>
>>
>>> If we ever want comments, we need to add support in the parser before
>>> any content accidentally uses the syntax, in other words pretty soon
>>> now.
>>
>> No, we can use any syntax that the parser currently ignores. It won't
>> break backwards compat with content that already uses it by then, since
>> the whole point of comments is to be ignored. The only difference is
>> whether validators complain or not.
>>
>>
>>> > On Tue, 14 Sep 2010, Anne van Kesteren wrote:
>>> > >
>>> > > Apart from text/plain I cannot think of a "web" text format that
>>> > > does not have comments.
>>> >
>>> > But what's the use case? Is it really useful to have comments in a
>>> > subtitle file?
>>>
>>> Being able to put licensing/contact information at the top of the file
>>> would be useful, just as it is in JavaScript/CSS.
>>
>> Well the parser explicitly skips over anything in the header block
>> (everything up to the first blank line IIRC), so if we find that people
>> want this then we can allow it without having to change any UAs except the
>> validators.
>>
>>
>>> > On Fri, 22 Oct 2010, Simon Pieters wrote:
>>> > > >
>>> > > > It can still be inspired by it though so we don't have to change
>>> > > > much. I'd be curious to hear what other things you'd clean up
>>> > > > given the chance.
>>> > >
>>> > > WebSRT has a number of quirks to be compatible with SRT, like
>>> > > supporting both comma and dot as decimal separators, the weird
>>> > > parsing of timestamps, etc.
>>> >
>>> > I've cleaned the timestamp parsing up. I didn't see others.
>>>
>>> I consider the cue id line (the line preceding the timing line) to be
>>> cruft carried over from SRT. When we now both have classes and the
>>> possibility of getting a cue by index, so why do we need it?
>>
>> It's optional, but it is useful, especially for metadata tracks, as a way
>> to grab specific cues. For example, consider a metadata or chapter track
>> that contains cues with specific IDs that the site would use to jump to
>> particular parts of the video in response to key presses, such as "start
>> of content after intro", or maybe for a podcast with different segments,
>> where the user can jump to "news" and "reviews" and "final thought" -- you
>> need an ID to be able to find the right cue quickly.
>>
>>
>>> > > There was also some discussion about metadata. Language is sometimes
>>> > > necessary for the font engine to pick the right glyph.
>>> >
>>> > Could you elaborate on this? My assumption was that we'd just use CSS,
>>> > which doesn't rely on language for this.
>>>
>>> It's not in any spec that I'm aware of, but some browsers (including
>>> Opera) pick different glyphs depending on the language of the text,
>>> which really helps when rendering CJK when you have several CJK fonts on
>>> the system. Browsers will already know the language from <track
>>> srclang>, so this would be for external players.
>>
>> How is this problem solved in SRT players today?
>>
>>
>> On Mon, 14 Feb 2011, Philip J盲genstedt wrote:
>>>
>>> Given that most existing subtitle formats don't have any language
>>> metadata, I'm a bit skeptical. However, if implementors of non-browser
>>> players want to implement WebVTT and ask for this I won't stand in the
>>> way (not that I could if I wanted to). For simplicity, I'd prefer the
>>> language metadata from the file to not have any effect on browsers
>>> though, even if no language is given on <track>.
>>
>> Indeed.
>>
>>
>> On Tue, 4 Jan 2011, Alex Bishop wrote:
>>>
>>> Firefox too. If you visit
>>> http://people.mozilla.org/~jdaggett/webfonts/serbianglyphs.html in
>>> Firefox 4, the text explicitly marked-up as being Serbian Cyrillic
>>> (using the lang="sr-Cyrl" attribute) uses some different glyphs to the
>>> text with no language metadata.
>>
>> This seems to be in violation of CSS; we should probably fix it there
>> before fixing it in WebVTT since WebVTT relis on CSS.
>>
>>
>> On Mon, 3 Jan 2011, Philip J盲genstedt wrote:
>>>
>>> > > * The "bad cue" handling is stricter than it should be. After
>>> > > collecting an id, the next line must be a timestamp line. Otherwise,
>>> > > we skip everything until a blank line, so in the following the
>>> > > parser would jump to "bad cue" on line "2" and skip the whole cue.
>>> > >
>>> > > 1
>>> > > 2
>>> > > 00:00:00.000 --> 00:00:01.000
>>> > > Bla
>>> > >
>>> > > This doesn't match what most existing SRT parsers do, as they simply
>>> > > look for timing lines and ignore everything else. If we really need
>>> > > to collect the id instead of ignoring it like everyone else, this
>>> > > should be more robust, so that a valid timing line always begins a
>>> > > new cue. Personally, I'd prefer if it is simply ignored and that we
>>> > > use some form of in-cue markup for styling hooks.
>>> >
>>> > The IDs are useful for referencing cues from script, so I haven't
>>> > removed them. I've also left the parsing as is for when neither the
>>> > first nor second line is a timing line, since that gives us a lot of
>>> > headroom for future extensions (we can do anything so long as the
>>> > second line doesn't start with a timestamp and "-->" and another
>>> > timestamp).
>>>
>>> In the case of feeding future extensions to current parsers, it's way
>>> better fallback behavior to simply ignore the unrecognized second line
>>> than to discard the entire cue. The current behavior seems unnecessarily
>>> strict and makes the parser more complicated than it needs to be. My
>>> preference is just ignore anything preceding the timing line, but even
>>> if we must have IDs it can still be made simpler and more robust than
>>> what is currently spec'ed.
>>
>> If we just ignore content until we hit a line that happens to look like a
>> timing line, then we are much more constrained in what we can do in the
>> future. For example, we couldn't introduce a "comment block" syntax, since
>> any comment containing a timing line wouldn't be ignored. On the other
>> hand if we keep the syntax as it is now, we can introduce a comment block
>> just by having its first line include a "-->" but not have it match the
>> timestamp syntax, e.g. by having it be "--> COMMENT" or some such.
>>
>> Looking at the parser more closely, I don't really see how doing anything
>> more complex than skipping the block entirely would be simpler than what
>> we have now, anyway.
>>
>>
>> On Mon, 3 Jan 2011, Glenn Maynard wrote:
>>>
>>> By the way, the WebSRT hit from Google
>>> (http://www.whatwg.org/specs/web-apps/current-work/websrt.html) is 404.
>>> I've had to read it out of the Google cache, since I'm not sure where it
>>> went.
>>
>> I added a redirect.
>>
>>
>>> Inline comments (not just line comments) in subtitles are very important
>>> for collaborative editing: for leaving notes about a translation, noting
>>> where editing is needed or why a change was made, and so on.
>>>
>>> If a DOM-like interface is specified for this (presumably this will
>>> happen later), being able to access inline comments like DOM comment
>>> nodes would be very useful for visual editors, to allow displaying
>>> comments and to support features like "seek to next comment".
>>
>> We can add comments pretty easily (e.g. we could say that "<!" starts a
>> comment and ">" ends it -- that's already being ignored by the current
>> parser), if people really need them. But are comments really that useful?
>> Did SRT have problem due to not supporting inline comments? (Or did it
>> support inline comments?)
>>
>>
>> On Tue, 4 Jan 2011, Glenn Maynard wrote:
>>> On Tue, Jan 4, 2011 at 4:24 AM, Philip J盲genstedt <philipj@opera.com>
>>> wrote:
>>> > If you need an intermediary format while editing, you can just use any
>>> > syntax you like and have the editor treat it specially.
>>>
>>> If I'd need to write my own parser to write an editor for it, that's one
>>> thing--but I hope I wouldn't need to create yet another ad hoc caption
>>> format, mirroring the features of this one, just to work around a lack
>>> of inline comments.
>>
>> An editor would need a custom parser anyway to make sure it round-tripped
>> syntax errors, presumably.
>>
>>
>>> The cue text already vaguely resembles HTML.  What about <!-- comments
>>> -->?  It's universally understood, and doesn't require any new escape
>>> mechanisms.
>>
>> The current parser would end a comment at the first ">", but so long as
>> you didn't have a ">" in the comment, "<!--...-->" would work fine within
>> cue text. (We would have to be careful in standalone blocks to define it
>> in such a way that it could not be confused with a timing line.)
>>
>>
>> On Wed, 5 Jan 2011, Philip J盲genstedt wrote:
>>>
>>> The question is rather if the comments should be exposed as DOM comment
>>> nodes in getCueAsHTML, which seems to be what you're asking for. That
>>> would only be possible if comments were only allowed inside the cue
>>> text, which means that you couldn't comment out entire cues, as such:
>>>
>>> 00:00.000 --> 00:01.000
>>> one
>>>
>>> /*
>>> 00:02.000 --> 00:03.000
>>> two
>>> */
>>>
>>> 00:04.000 --> 00:05.000
>>> three
>>>
>>> Therefore, my thinking is that comments should be removed during parsing
>>> and not be exposed to any layer above it.
>>
>> We can support both, if there's really demand for it.
>>
>> For example:
>>
>>  00:00.000 --> 00:01.000
>>  one <! inline comment > one
>>
>>  COMMENT-->
>>  00:02.000 --> 00:03.000
>>  two; this is entirely
>>  commented out
>>
>>  <! this is the ID line
>>  00:04.000 --> 00:05.000
>>  three; last line is a ">"
>>  which is part of the cue
>>  and is not a comment.
>>  >
>>
>> The above would work today in a conforming UA. The question really is what
>> parts of this do we want to support and what do we not care enough about.
>>
>>
>> On Wed, 5 Jan 2011, Anne van Kesteren wrote:
>>> On Wed, 05 Jan 2011 10:58:56 +0100, Philip J盲genstedt
>>> <philipj@opera.com> wrote:
>>> > Therefore, my thinking is that comments should be removed during
>>> > parsing and not be exposed to any layer above it.
>>>
>>> CSS does that too. It has not caused problems so far. It does mean
>>> editing tools need a slightly different DOM, but that is always the case
>>> as they want to preserve whitespace details, etc., too. At least editors
>>> that have both a text and visual interface.
>>
>> Right.
>>
>>
>> On Fri, 14 Jan 2011, Silvia Pfeiffer wrote:
>>>
>>> We are concerned, however, about the introduction of WebVTT as a
>>> universal captioning format *when used outside browsers*. Since a subset
>>> of CSS features is required to bring HTML5 video captions on par with TV
>>> captions, non-browser applications will need to support these CSS
>>> features, too. However, we do not believe that external CSS files are an
>>> acceptable solution for non-browser captioning and therefore think that
>>> those CSS features (see [1]) should eventually be made part of the
>>> WebVTT specification.
>>>
>>> [1] http://www.whatwg.org/specs/web-apps/current-work/multipage/rendering.html#the-'::cue'-pseudo-element
>>
>> I'm not sure what you mean by "made part of the WebVTT specification", but
>> if you mean that WebVTT should support inline CSS, that does seem line
>> something we can add, e.g. using syntax like this:
>>
>>   WEBVTT
>>
>>   STYLE-->
>>   ::cue(v[voice=Bob]) { color: green; }
>>   ::cue(c.narration) { font-style: italic; }
>>   ::cue(c.narration i) { font-style: normal; }
>>
>>   00:00.000 --> 00:02.000
>>   Welcome.
>>
>>   00:02.500 --> 00:05.000
>>   To WebVTT.
>>
>> I suggest we wait until WebVTT and '::cue' in particular have shipped in
>> at least one browser and been demonstrated as being useful before adding
>> this kind of feature though.
>>
>>
>>> 1. Introduce file-wide metadata
>>>
>>> WebVTT requires a structure to add header-style metadata. We are here
>>> talking about lists of name-value pairs as typically in use for header
>>> information. The metadata can be optional, but we need a defined means
>>> of adding them.
>>>
>>> Required attributes in WebVTT files should be the main language in use
>>> and the kind of data found in the WebVTT file - information that is
>>> currently provided in the <track> element by the @srclang and @kind
>>> attributes. These are necessary to allow the files to be interpreted
>>> correctly by non-browser applications, for transcoding or to determine
>>> if a file was created as a caption file or something else, in particular
>>> the @kind=metadata. @srclang also sets the base directionality for BiDi
>>> calculations.
>>>
>>> Further metadata fields that are typically used by authors to keep
>>> specific authoring information or usage hints are necessary, too. As
>>> examples of current use see the format of MPlayer mpsub’s header
>>> metadata [2], EBU STL’s General Subtitle Information block [3], and
>>> even CEA-608’s Extended Data Service with its StartDate, Station,
>>> Program, Category and TVRating information [4]. Rather than specifying a
>>> specific subset of potential fields we recommend to just have the means
>>> to provide name-value pairs and leave it to the negotiation between the
>>> author and the publisher which fields they expect of each other.
>>>
>>> [2] http://www.mplayerhq.hu/DOCS/tech/mpsub.sub
>>> [3] https://docs.google.com/viewer?a=v&q=cache:UKnzJubrIh8J:tech.ebu.ch/docs/tech/tech3264.pdf
>>> [4] http://edocket.access.gpo.gov/cfr_2007/octqtr/pdf/47cfr15.119.pdf
>>
>> I don't understand the use cases here.
>>
>> CSS and JS don't have anything like this, why should WebVTT? What problem
>> is this solving? How did SRT solve this problem?
>>
>>
>>> 2. Introduce file-wide cue settings
>>>
>>> At the moment if authors want to change the default display of cues,
>>> they can only set them per cue (with the D:, S:, L:, A: and T:. cue
>>> settings) or have to use an external CSS file through a HTML page with
>>> the ::cue pseudo-element. In particular when considering that all
>>> Asian language files would require a “D:vertical” marker, it becomes
>>> obvious that this replication of information in every cue is
>>> inefficient and a waste of bandwidth, storage, and application speed.
>>> A cue setting default section should be introduced into a file
>>> header/setup area of WebVTT which will avoid such replication.
>>>
>>> An example document with cue setting defaults in the header could look
>>> as follows:
>>> ==
>>> WEBVTT
>>> Language=zh
>>> Kind=Caption
>>> CueSettings= A:end D:vertical
>>>
>>> 00:00:15.000 --> 00:00:17.950
>>> 在左边我们可以看到...
>>>
>>> 00:00:18.160 --> 00:00:20.080
>>> 在右边我们可以看到...
>>>
>>> 00:00:20.110 --> 00:00:21.960
>>> ...捕蝇草械.
>>> ==
>>>
>>> Note that you might consider that the solution to this problem is to use
>>> external CSS to specify a change to all cues. However, this is not
>>> acceptable for non-browser applications and therefore not an acceptable
>>> solution to this problem.
>>
>> Adding defaults seems like a reasonable feature. We could add this just by
>> adding the ability to have a block in a VTT file like this:
>>
>>   WEBVTT
>>
>>   DEFAULTS --> A:vertical A:end
>>
>>   00:00.000 --> 00:02.000
>>   This is vertical and end-aligned.
>>
>>   00:02.500 --> 00:05.000
>>   As is this.
>>
>>   DEFAULTS --> A:start
>>
>>   00:05.500 --> 00:07.000
>>   This is horizontal and start-aligned.
>>
>> However, again I suggest that we wait until WebVTT has been deployed in at
>> least one browser before adding more features like this.
>>
>>
>>> * positioning: Generally the way in which we need positioning to work is
>>> to provide an anchor position for the text and then explain in which
>>> direction font size changes and the addition of more text allows the
>>> text segment to grow. It seems that the line position cue (L) provides a
>>> baseline position and the alignment cue (A) provides the growing
>>> direction start/middle/end. Can we just confirm this understanding?
>>
>> It's more the other way around: the line boxes are laid out and then the
>> resulting line boxes are positioned according to the A: and L: lines. In
>> particular, the L: lines when given with a % character position the line
>> boxes in the same manner that CSS background-position positions the
>> background image, and L: lines without a % character set the position of
>> the line boxes based on the height of the first line box. A: lines then
>> just set the position of these line boxes relative to the other dimension.
>>
>>
>>> * fontsize: When changing text size in relation to the video changing
>>> size or resolution, we need to make sure not to reduce the text size
>>> below a specific font size for readability reasons. And we also need to
>>> make sure not to make it larger than a specific font size, since
>>> otherwise it will dominate the display. We usually want the text to be
>>> at least Xpx, but no bigger than Ypx. Also, one needs to pay attention
>>> to the effect that significant player size changes have on relative
>>> positioning - in particular for the minimum caption text size. Dealing
>>> with min and max sizes is missing from the current specification in our
>>> understanding.
>>
>> That's a CSS implementation issue. Minimum font sizes are commonly
>> supported in CSS implementations. Maximum font sizes would be similar.
>>
>>
>>> * bidi text: In our experience from YouTube, we regularly see captions
>>> that contain mixed languages/directionality, such as Hebrew captions
>>> that have a word of English in it. How do we allow for bidi text inside
>>> cues? How do we change directionality mid-cue? Do we deal with the
>>> zero-width LTR-mark and RTL-mark unicode characters? It would be good to
>>> explain how these issues are dealt with in WebVTT.
>>
>> There's nothing special about how they work in WebVTT; they are handled
>> the same as in CSS.
>>
>>
>>> * internationalisation: D:vertical and D:vertical-lr seem to only work
>>> for vertical text - how about horizontal-rl? For example, Hebrew is a
>>> prime example of a language being written from right to left
>>> horizontally. Is that supported and how?
>>
>> What exactly would horizontal-rl do?
>>
>>
>>> * naming: The usage of single letter abbreviations for cue settings has
>>> created quite a discussion here at Google. We all agree that file-wide
>>> cue settings are required and that this will reduce the need for
>>> cue-specific cue settings. We can thus afford a bit more readability in
>>> the cue settings. We therefore believe that it would be better if the
>>> cue settings were short names rather than single letter codes. This
>>> would be more like CSS, too, and easier to learn and get right. In the
>>> interface description, the 5 dimensions have proper names which could be
>>> re-used (“direction”, “linePosition”, “textPosition”, “size” and
>>> “align"). We therefore recommend replacing the single-letter cue
>>> commands with these longer names.
>>
>> That would massively bloat these files and make editing them a huge pain,
>> as far as I can tell. I agree that defaults would make it better, but many
>> cues would still need their own positioning and sizing information, and
>> anything beyond a very few letters would IMHO quickly become far too
>> verbose for most people. "L", "A", and "S" are pretty mnemonic, "T" would
>> quickly become familiar to people writing cues, and "D" is only going to
>> be relevant to some authors but for those authors it's pretty
>> self-explanatory as well, since the value is verbose.
>>
>> What I really would like to do is use "X" and "Y" instead of "T" and "L",
>> but those terms would be very confusing when we flip the direction, which
>> is why I used the less obvious "T" and "L".
>>
>>
>>> * textcolor: In particular on European TV it is common to distinguish
>>> between speakers by giving their speech different colors. The following
>>> colors are supported by EBU STL, CEA-608 and CEA-708 and should be
>>> supported in WebVTT without the use of external CSS: black, red, green,
>>> yellow, blue, magenta, cyan, and white. As default we recommend white on
>>> a grey transparent background.
>>
>> This is supported as 'color' and 'background'.
>>
>>
>>> * underline: EBU STL, CEA-608 and CEA-708 support underlining of
>>> characters.
>>
>> I've added support for 'text-decoration'.
>>
>>
>>> The underline character is also particularly important for some Asian
>>> languages.
>>
>> Could you elaborate on this?
>>
>>
>>> Please make it possible to provide text underlines without the use of
>>> CSS in WebVTT.
>>
>> Why without CSS?
>>
>>
>>> * blink: As much as we would like to discourage blinking subtitles, they
>>> are actually a core requirement for EBU STL and CEA-608/708 captions and
>>> in use in particular for emergency messages and similar highly important
>>> information. Blinking can be considered optional for implementation, but
>>> we should allow for it in the standard.
>>
>> This is part of 'text-decoration'.
>>
>>
>>> * font face: CEA-708 provides a choice of eight font tags: undefined,
>>> monospaced serif, proportional serif, monospaced sans serif,
>>> proportional sans serif, casual, cursive, small capital. These fonts
>>> should be available for WebVTT as well. Is this the case?
>>
>> Yes.
>>
>>
>>> We are not sure about the best solution to these needs. Would it be best
>>> to introduce specific tags for these needs?
>>
>> CSS seems to handle these needs adequately.
>>
>>
>>> We have a couple of recommendations for changes mostly for aesthetic and
>>> efficiency reasons. We would like to point out that Google is very
>>> concerned with the dense specification of data and every surplus
>>> character, in particular if it is repeated a lot and doesn’t fulfill a
>>> need, should be removed to reduce the load created on worldwide
>>> networking and storage infrastructures and help render Web pages faster.
>>
>> This seems to contradict your earlier request to make the languge more
>> verbose...
>>
>>
>>> * Time markers: WebVTT time stamps follow no existing standard for time
>>> markers. Has the use of NPT as introduced by RTSP[5] for time markers
>>> been considered (in particular npt-hhmmss)?
>>>
>>> [5] http://www.ietf.org/rfc/rfc2326.txt
>>
>> WebVTT follows the SRT format, with commas replaced by periods for
>> consistency with the rest of the platform.
>>
>>
>>> * Suggest dropping “-->”: In the context of HTML, “-->” is an end
>>> comment marker. It may confuse Web developers and parsers if such a sign
>>> is used as a separator. For example, some translation tools expect HTML
>>> or XML-based interchange formats and interpret the “>” as part of a
>>> tag. Also, common caption convention often uses “>” to represent
>>> speaker identification. Thus it is more difficult to write a filter
>>> which correctly escapes “-->” but retains “>” for speaker ID.
>>
>> "-->" seems pretty mnemonic to me. I don't see why we'd want to drop it.
>>
>>
>>> * Duration specification: WebVTT time stamps are always absolute time
>>> stamps calculated in relation to the base time of synchronisation with
>>> the media resource. While this is simple to deal with for machines, it
>>> is much easier for hand-created captions to deal with relative time
>>> stamps for cue end times and for the timestamp markers within cues. Cue
>>> start times should continue to stay absolute time stamps. Timestamp
>>> markers within cues should be relative to the cue start time. Cue end
>>> times should be possible to be specified either as absolute or relative
>>> timestamps. The relative time stamps could be specified through a prefix
>>> of “+” in front of a “ss.mmm” second and millisecond specification.
>>> These are not only simpler to read and author, but are also more compact
>>> and therefore create smaller files.
>>
>> I think if anything is absolute, it doesn't really make anything much
>> simpler for anything else to be relative, to be honest. Take the example
>> you give here:
>>
>>> An example document with relative timestamps is:
>>> ==
>>> WEBVTT
>>> Language=en
>>> Kind=Subtitle
>>>
>>> 00:00:15.000   +2.950
>>> At the left we can see...
>>>
>>> 00:00:18.160    +1.920
>>> At the right we can see the...
>>>
>>> 00:00:20.110   +1.850
>>> ...the <+0.400>head-<+0.800>snarlers
>>> ==
>>
>> If the author were to change the first time stamp because the video gained
>> a 30 second advertisement at the start, then he would still need to change
>> the hundreds of subseqent timestamps for all the additional cues. What
>> does the author gain from not having to change the relative stamps? It's
>> not like he's going to be doing it by hand, and once a tool is involved,
>> the tool can change everything just as easily.
>>
>>
>>> We are happy to see the introduction of the magic file identifier for
>>> WebVTT which will make it easier to identify the file format. We do not
>>> believe the “FILE” part of the string is necessary.
>>
>> I have removed it.
>>
>>
>>> However, we recommend to also introduce a format version number that the
>>> file adheres to, e.g. “WEBVTT 0.7”.
>>
>> Version numbers are an antipattern on the Web, so I have not added one.
>>
>>
>>> This helps to make non-browser systems that parse such files become
>>> aware of format changes.
>>
>> The format will never change in a non-backwards-compatible fashion once it
>> is deployed, so that is not a concern.
>>
>>
>>> It can also help identify proprietary standard metadata sets as used by
>>> a specific company, such as “WEBVTT 0.7 ABC-meta1” which could signify
>>> that the file adheres to WEBVTT 0.7 format specification with the
>>> ABC-meta1 metadata schema.
>>
>> If we add metadata, then that can be handled just by having the metadata
>> include that information itself.
>>
>>
>>> CEA-708 captions support automatic line wrapping in a more sophisticated
>>> way than WebVTT -- see http://en.wikipedia.org/wiki/CEA-708#Word_wrap.
>>>
>>> In our experience with YouTube we have found that in certain situations
>>> this type of automatic line wrapping is very useful. Captions that were
>>> authored for display in a full-screen video may contain too many words
>>> to be displayed fully within the actual video presentation (note that
>>> mobile / desktop / internet TV devices may each have a different amount
>>> of space available, and embedded videos may be of arbitrary sizes).
>>> Furthermore, user-selected fonts or font sizes may be larger than
>>> expected, especially for viewers who need larger print.
>>>
>>> WebVTT as currently specified wraps text at the edge of their containing
>>> blocks, regardless of the value of the 'white-space' property, even if
>>> doing so requires splitting a word where there is no line breaking
>>> opportunity. This will tend to create poor quality captions.  For
>>> languages where it makes sense, line wrapping should only be possible at
>>> carriage return, space, or hyphen characters, but not on &nbsp;
>>> characters.  (Note that CEA-708 also contains non-breaking space and
>>> non-breaking transparent space characters to help control wrapping.)
>>> However, this algorithm will not necessarily work for all languages.
>>>
>>> We therefore suggest that a better solution for line wrapping would be
>>> to use the existing line wrapping algorithms of browsers, which are
>>> presumably already language-sensitive.
>>>
>>> [Note: the YouTube line wrapping algorithm goes even further by
>>> splitting single caption cues into multiple cues if there is too much
>>> text to reasonably fit within the area. YouTube then adjusts the times
>>> of these caption cues so they appear sequentially.  Perhaps this could
>>> be mentioned as another option for server-side tools.]
>>
>> I've adjusted the text in the spec to more clearly require that
>> line-breaking follow normal CSS rules but with the additional requirement
>> that there not be overflow, which is what I had intended.
>>
>>
>>> 1. Pop-on/paint-on/roll-up support
>>>
>>> Three different types of captions are common on TV: pop-on, roll-up and
>>> paint-on. Captions according to CEA-608/708 need to support captions of
>>> all three of these types. We believe they are already supported in
>>> WebVTT, but see a need to re-confirm.
>>>
>>> For pop-on captions, a complete caption cue is timed to appear at a
>>> certain time and disappear a few seconds later. This is the typical way
>>> in which captions are presented and also how WebVTT/<track> works in our
>>> understanding. Is this correct?
>>
>> As far as I understand, yes.
>>
>>
>>> For roll-up captions, individual lines of captions are presented
>>> successively with older lines moving up a line to make space for new
>>> lines underneath. Assuming we understand the WebVTT rendering rules
>>> correctly, WebVTT would identify each of these lines as an individual,
>>> but time-overlapping cue with the other cues. As more cues are created
>>> and overlap in time, newer cues are added below the currently visible
>>> ones and move the currently visible ones up, basically creating a
>>> roll-up effect. If this is a correct understanding, then this is an
>>> acceptable means of supporting roll-up captions.
>>
>> I am not aware of anything currently in the WebVTT specification which
>> will cause a cue to move after it has been placed on the video, so I do
>> not believe this is a correct understanding.
>>
>> However, you can always have a cue be replaced by a cue with the same text
>> but on a higher line, if you're willing to do some preprocessing on the
>> subtitle file. It won't be a smoothly animated scroll, but it would work.
>>
>> If there is convincing evidence that this kind of subtitle is used on the
>> Web, though, we can support it more natively. So far I've only seen it in
>> legacy scenarios that do not really map to expected WebVTT use cases.
>>
>> For supporting those legacy scenarios, you need script anyway (to handle,
>> e.g., backspace and moving the cursor). If you have script, doing
>> scrolling is possible either by moving the cue, or by not using the
>> default UA rendering of the cues at all and doing it manually (e.g. using
>> <div>s or <canvas>).
>>
>>
>>> Finally, for paint-on captions, individual letters or words are
>>> displayed successively on screen. WebVTT supports this functionality
>>> with the cue timestamps <xx:xx:xx.xxx>, which allows to specify
>>> characters or words to appear with a delay from within a cue. This
>>> essentially realizes paint-on captions. Is this correct?
>>
>> Yes.
>>
>>
>>> (Note that we suggest using relative timestamps inside cues to make this
>>> feature more usable.)
>>
>> It makes it modestly easier to do by hand, but hand-authoring a "paint-on"
>> style caption seems like a world of pain regardless of the timestamp
>> format we end up using, so I'm not sure it's a good argument for
>> complicating the syntax with a second timestamp format.
>>
>>
>>> The HTML spec specifies that it is not allowed to have two tracks that
>>> provide the same kind of data for the same language (potentially empty)
>>> and for the same label (potentially empty). However, we need
>>> clarification on what happens if there is a duplicate track, ie: does
>>> the most recent one win or the first one or will both be made available
>>> in the UI and JavaScript?
>>
>> They are both available.
>>
>>
>>> The spec only states that the combination of {kind, type, label} must be
>>> unique. It doesn't say what happens if they are not.
>>
>> Nothing different happens if they are not than if they are. It's just a
>> conformance requirement.
>>
>>
>>> Further, the spec says nothing about duplicate labels altogether - what
>>> is a browser supposed to do when two tracks have been marked with the
>>> same label?
>>
>> That same as it does if they have different labels.
>>
>>
>>> It is very important that there is a possibility for users to
>>> auto-activate tracks. Which track is chosen as the default track to
>>> activate depends on the language preferences of the user. The user is
>>> assumed to have a list of language preferences which leads this choice.
>>
>> I've added a "default" attribute so that sites can control this.
>>
>>
>>> In YouTube, if any tracks exist that match the first language
>>> preference, the first of those is used as the default.  A track with
>>> no name sorts ahead of one with a name.  The sorting is done according
>>> to that language's collation order. In order to override this you
>>> would need (1) a default=true attribute for a track which gives it
>>> precedence if its language matches, and (2) a way to force the
>>> language preference. If no tracks exist for the first language pref,
>>> the second language pref is checked, and so on.
>>>
>>> If the user's language preferences are known, and there are no tracks
>>> in that language, you have other options:
>>>   (1) offer to do auto-translation (or just do it)
>>>   (2) use a track in the same language that the video's audio is in (if known)
>>>   (3) if only one track, use the first available track
>>>
>>> Also make sure the language choice can be overriden by the user
>>> through interaction.
>>>
>>> We’d like to make sure this or a similar algorithm is the recommended
>>> way in which browsers deal with caption tracks.
>>
>> This seems to me to be a user agent quality of implementation issue. User
>> preferences almost by definition can't be interoperable, so it's not
>> something we can specify.
>>
>>
>>> As far as we understand, you can currently address all cues through
>>> ::cue and you can address a cue part through ::cue-part(<voice> ||
>>> <part> || <position> || <future-compatibility>). However, if we
>>> understand correctly, it doesn’t seem to be possible to address an
>>> individual cue through CSS, even though cues have individual
>>> identifiers. This is either an oversight or a misunderstanding on our
>>> parts. Can you please clarify how it is possible to address an
>>> individual cue through CSS?
>>
>> I've made the ID referencable from the ::cue() selector argument as an ID
>> on the anonymous root element.
>>
>>
>>> Our experience with automated caption creation and positioning on
>>> YouTube indicates that it is almost impossible to always place the
>>> captions out of the way of where a user may be interested to look at. We
>>> therefore allow users to dynamically move the caption rendering area to
>>> a different viewport position to reveal what is underneath. We recommend
>>> such drag-and-drop functionality also be made available for TimedTrack
>>> captions on the Web, especially when no specific positioning information
>>> is provided.
>>
>> I've added text to explicitly allow this.
>>
>>
>> On Sat, 22 Jan 2011, Philip J盲genstedt wrote:
>>>
>>> Indeed, repeating settings on each cue would be annoying. However,
>>> file-wide settings seems like it would easily be too broad, and you'd
>>> have to explicitly reverse the effect on the cues where you don't want
>>> it to apply. Maybe classes of cue settings or some kind of macros would
>>> work better.
>>
>> My assumption is that similar cues will typically be grouped together, so
>> that one could introduce the group with a "DEFAULTS" block and then
>>
>>
>>> Nitpick: Modern Chinese, including captions, is written left-to-right,
>>> top-to-bottom, just like English.
>>
>> Indeed. I don't expect there will be much vertical text captioning. I
>> added it primarily to support some esoteric Anime cases.
>>
>>
>>
>>> That the intra-cue timings are relative but the timing lines are
>>> absolute has bugged me a bit, so if the distinction was more obvious
>>> just from the syntax, that'd be great!
>>
>> They're all absolute.
>>
>>
>>> [for the file signature] "WebSRT" is prettier than "WEBSRT".
>>
>> The idea is not to be pretty, the idea is to stand out. :-)
>>
>>
>>> I'm inclined to say that we should normalize all whitespace during
>>> parsing and not have explicit line breaks at all. If people really want
>>> two lines, they should use two cues. In practice, I don't know how well
>>> that would fare, though. What other solutions are there?
>>
>> I think we definitely need line breaks, e.g. for cases like:
>>
>>  -- Do you want to go to the zoo?
>>  -- Yes!
>>  -- Then put your shoes on!
>>
>> ...which is quite common style in some locales.
>>
>> However, I agree that we should encourage people to let browsers wrap the
>> lines. Not sure how to encourage that more.
>>
>>
>> On Sun, 23 Jan 2011, Glenn Maynard wrote:
>>>
>>> It should be possible to specify language per-cue, or better, per block
>>> of text mid-cue.  Subtitles making use of multiple languages are common,
>>> and it should be possible to apply proper font selection and word
>>> wrapping to all languages in use, not just the primary language.
>>
>> It's not clear to me that we need language information to apply proper
>> font selection and word wrapping, since CSS doesn't do it.
>>
>>
>>> When both English subtitles and Japanese captions are on screen, it
>>> would be very bad to choose a Chinese font for the Japanese text, and
>>> worse to choose a Western font and use it for everything, even if
>>> English is the predominant language in the file.
>>
>> Can't you get around this using explicit styles, e.g. against classes?
>> Unless this really is going to be a common problem, I'm not particularly
>> concerned about it.
>>
>>
>> On Mon, 24 Jan 2011, Philip J盲genstedt wrote:
>>>
>>> Multi-languaged subtitles/captions seem to be extremely uncommon,
>>> unsurprisingly, since you have to understand all the languages to be
>>> able to read them.
>>>
>>> The case you mention isn't a problem, you just specify Japanese as the
>>> main language.
>>
>> Indeed.
>>
>>
>>> There are a few other theoretical cases:
>>>
>>> * Multi-language CJK captions. I've never seen this, but outside of
>>> captioning, it seems like the foreign script is usually transcribed to
>>> the native script (e.g. writing Japanese names with simplified Chinese
>>> characters).
>>>
>>> * Use of Japanese or Chinese words in a mostly non-CJK subtitles. This
>>> would make correct glyph selection impossible, but I've never seen it.
>>>
>>> * Voice synthesis of e.g. mixed English/French captions. Given that this
>>> would only be useful to be people who know both languages, it seem not
>>> worth complicating the format for.
>>
>> Agreed on all fronts.
>>
>>
>>> Do you have any examples of real-world subtitles/captions that would
>>> benefit from more fine-grained language information?
>>
>> This kind of information would indeed be useful.
>>
>>
>> On Mon, 24 Jan 2011, Glenn Maynard wrote:
>>>
>>> They're very common in anime fansubs:
>>>
>>> http://img339.imageshack.us/img339/2681/screenshotgg.jpg
>>>
>>> The text on the left is a transcription, the top is a transliteration,
>>> and the bottom is a translation.
>>
>> Aren't these three separate text tracks?
>>
>>
>>> I'm pretty sure I've also seen cases of translation notes mixing
>>> languages within the same caption, eg. "jinja (绁炵ぞ): shrine", but
>>> it's less common and I don't have an example handy.
>>
>> Mixing one CJK language with one non-CJK language seems fine. That should
>> always work, assuming you specify good fonts in the CSS.
>>
>>
>>> > The case you mention isn't a problem, you just specify Japanese as the
>>> > main language. There are a few other theoretical cases:
>>>
>>> Then you're indicating that English text is Japanese, which I'd expect
>>> to cause UAs to render everything with a Japanese font.  That's what
>>> happens when I load English text in Firefox and force SJIS: everything
>>> is rendered in MS PGothic.  That's probably just what Japanese users
>>> want for English text mixed in with Japanese text, too--but it's
>>> generally not what English users want with the reverse.
>>
>> I don't understand why we can't have good typography for CJK and non-CJK
>> together. Surely there are fonts that get both right?
>>
>>
>> On Mon, 24 Jan 2011, Glenn Maynard wrote:
>>> >
>>> > [ use multiple tracks ]
>>>
>>> Personally I'd prefer that, but it would require a good deal of metadata
>>> support--marking which tracks are meant to be used together, tagging
>>> auxilliary track types so browsers can choose (eg. an "English subtitles
>>> with no song caption tracks" option), and so on.  I'm sure that's a
>>> non-starter (and I'd agree).
>>
>> It's not that much metadata. It's far less effort than making the
>> subtitles in the first place.
>>
>>
>>> I don't think you should need to resort to fine-grained font control to get
>>> reasonable default fonts.
>>
>> I agree entirely, but I don't think you should need to resort to
>> fine-grained language tagging either...
>>
>>
>>> The above--semantics vs. presentation--brings something else to mind.
>>> One of the harder things to subtitle well is when you have two
>>> conversations talking on top of each other.  This is generally done by
>>> choosing a vertical spot for each conversation (generally augmented with
>>> a color), so the viewer can easily follow one or the other.  Setting the
>>> line position *sort of* lets you do this, but that's hard to get right,
>>> since you don't know how far apart to put them.  You'd have to err
>>> towards putting them too far apart (guessing the maximum number of lines
>>> text might be wrapped to, and covering up much more of the screen than
>>> usually needed), or putting one set on the top of the screen (making it
>>> completely impossible to read both at once, rather than just
>>> challenging).
>>>
>>> If I remember correctly, SSA files do this with a hack: wherever there's
>>> a blank spot in one or the other conversation, a transparent dummy cue
>>> is added to keep the other conversation in the correct relative spot, so
>>> the two conversations don't swap places.
>>>
>>> I mention this because it comes to mind as something well-authored,
>>> well-rendered subtitles need to get right, and I'm curious if there's a
>>> reliable way to do this currently with WebVTT.  If this isn't handled,
>>> some scenes just fall apart.
>>
>> It's intended to be done using the L: feature to pick the lines. If the
>> cues have more line wrapping than the author expected, it'll break. The
>> only way around that would be to go through the whole file (or at least,
>> the whole scene, somehow marked up as such) pre-rendering each cue to work
>> out what the maximum line heights would be and then using that offset for
>> each cue, etc, but that seems like a whole lot of complexity for a minor
>> use case. Is line wrapping really going to be that unpredictable?
>>
>>
>> On Mon, 24 Jan 2011, Philip J盲genstedt wrote:
>>>
>>> My main point here is that the use cases are so marginal. If there were
>>> more compelling ones, it's not hard to support intra-cue language
>>> settings using syntax like <lang en>bla</lang> or similar.
>>
>> Indeed.
>>
>>
>> On Mon, 24 Jan 2011, Glenn Maynard wrote:
>>>
>>> Here's one that I think was done very well, rendered statically to make
>>> sure we're all seeing the same thing:
>>>
>>> http://zewt.org/~glenn/multiple%20conversation%20example.mpg
>>>
>>> The results are pretty straightforward.  One always stays on top, one
>>> always stays on the bottom, and most of the time the spacing between the
>>> two is correct--the normal distance the UA uses between two vertical
>>> captions (which would be lost by specifying the line height explicitly).
>>> Combined with the separate coloring (which is already possible, of
>>> course), it's possible to read both conversations and intuitively track
>>> which is which, and it's also very easy to just pick one or the other to
>>> read.
>>
>> As far as I can tell, the WebVTT algorithm would handle this case pretty
>> well.
>>
>>
>>> One example of how this can be tricky: at 0:17, a caption on the bottom
>>> wraps and takes two lines, which then pushes the line at 0:19 upward
>>> (that part's simple enough).  If instead the top part had appeared
>>> first, the renderer would need to figure out in advance to push it
>>> upwards, to make space for the two-line caption underneith it.
>>> Otherwise, the captions would be forced to switch places.
>>
>> Right, without lookahead I don't know how you'd solve it. With lookahead
>> things get pretty dicey pretty quickly.
>>
>>
>> On Mon, 24 Jan 2011, Tab Atkins Jr. wrote:
>>>
>>> Right now, the WebVTT spec handles this by writing the text in white on
>>> top of a partially-transparent black background.  The text thus never
>>> has contrast troubles, at the cost of a dark block covering up part of
>>> the display.
>>>
>>> Stroking text is easy, though.  Webkit has an experimental property for
>>> doing it directly.  Using existing CSS, it's easy to adapt text-shadow
>>> to produce a good outline - just make four shadows, offset by 1px in
>>> each direction, and you're good.
>>
>> WebVTT allows both text-shadow and text-outline.
>>
>>
>> On Wed, 9 Feb 2011, Silvia Pfeiffer wrote:
>>>
>>> We're trying to avoid the need for multiple transcodings and are trying
>>> to achieve something like the following pipeline: broadcast captions ->
>>> transcode to WebVTT -> show in browser -> transcode to broadcast devices
>>> -> show
>>
>> Why not just do:
>>
>>   broadcast captions -> transcode to WebVTT -> show in browser
>>
>> ...for browsers and:
>>
>>   broadcast captions -> show
>>
>> ...for legacy broadcast devices?
>>
>>
>> In any case the amount of legacy broadcast captions pales in comparison to
>> the volume of new captions we will see for the Web. I'm not really
>> convinced that legacy broadcast captions are an important concern here.
>>
>>
>>> What is the argument against using <u> in captions?
>>
>> What is the argument _for_ using <u> in captions? We don't add features
>> due to a lack of reasons not to. We add features due to a plethora of
>> reasons to do so.
>>
>>
>>> > [ foolip suggests using multiple cues to do blinking ]
>>>
>>> But from a captioning/subtitling point of view it's probably hard to
>>> convert that back to blinking text, since we've just lost the semantic
>>> by ripping it into multiple cues (and every program would use different
>>> ways of doing this).
>>
>> I do not think round-tripping legacy broadcast captions through WebVTT is
>> an important use case. If that is something that we should support, then
>> we should first establish why it is an important use case, and then
>> reconsider WebVTT within that context, rather than adding features to
>> handle it piecemeal.
>>
>>
>>> I guess what we are discovering is that we can define the general format
>>> of WebVTT for the Web, but that there may be an additional need to
>>> provide minimum implementation needs (a "profile" if you want - as much
>>> as I hate this word).
>>
>> Personally I have nothing against the word "profile", but I do have
>> something against providing for "minimum implemenatation needs".
>>
>> Interoperability means everything works the same everywhere.
>>
>>
>>> [re versioning the file format]
>>> In a contract between a caption provider and a caption consumer (I am
>>> talking about companies here), the caption consumer will want to tell
>>> the caption provider what kind of features they expect the caption files
>>> to contain and features they want avoided. This links back to the
>>> earlier identified need for "profiles". This is actually probably
>>> something outside the scope of this group, but I am sure there is a need
>>> for such a feature, in particular if we want to keep the development of
>>> the WebVTT specification open for future extensions.
>>
>> I don't see why there would be a need for anything beyond "make sure it
>> works with deployed software", maybe with that being explicitly translated
>> to specific features and workarounds for known bugs, e.g. "you can use
>> ruby, but make sure you don't have timestamps out of order".
>>
>> This, however, has no correlation to versions of the format.
>>
>>
>> On Mon, 14 Feb 2011, Philip J盲genstedt wrote:
>>> >
>>> > [line wrapping]
>>>
>>> There's still plenty of room for improvements in line wrapping, though.
>>> It seems to me that the main reason that people line wrap captions
>>> manually is to avoid getting two lines of very different length, as that
>>> looks quite unbalanced. There's no way to make that happen with CSS, and
>>> AFAIK it's not done by the WebVTT rendering spec either.
>>
>> WebVTT just defers to CSS for this. I agree that it would be nice for CSS
>> to allow UAs to do more clever things here and (more importantly) for UAs
>> to actually do more clever things here.
>>
>>
>> On Tue, 15 Feb 2011, Silvia Pfeiffer wrote:
>>> foolip wrote:
>>> >
>>> > Sure, it's already handled by the current parsing spec, since it
>>> > ignores everything up to the first blank line.
>>>
>>> That's not quite how I'm reading the spec.
>>>
>>> http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#webvtt-0
>>> allows
>>> "Optionally, either a U+0020 SPACE character or a U+0009 CHARACTER
>>> TABULATION (tab) character followed by any number of characters that
>>> are not U+000A LINE FEED (LF) or U+000D CARRIAGE RETURN (CR)
>>> characters."
>>> after the "WEBVTT FILE" magic.
>>> To me that reads like all of the extra stuff has to be on the same line.
>>> I'd prefer if this read "any character except for two WebVTT line
>>> terminators", then it would all be ready for such header-style
>>> metadata.
>>
>> That's the syntax rules. It's not the parser.
>>
>>
>>> I'm told <u> is fairly common in traditional captions.
>>
>> I've never seen it. Do you have any data on this?
>>
>>
>>> > Personally, I think we're going to see more and more devices running
>>> > full browsers with webfonts support, and that this isn't going to be a
>>> > big problem.
>>>
>>> I tend to agree and in fact I see that as the shiny future. Just not
>>> quite yet.
>>
>> We're not quite at WebVTT yet either. Currently, there's more support for
>> WebFonts than WebVTT.
>>
>>
>> On Tue, 15 Feb 2011, Glenn Maynard wrote:
>>>
>>> I think that, no matter what you do, people will insert line breaks in
>>> cues.  I'd follow the HTML model here: convert newlines to spaces and
>>> have a separate, explicit line break like <br> if needed, so people
>>> don't manually line-break unless they actually mean to.
>>
>> The line-breaks-are-line-breaks feature is one of the features that
>> originally made SRT seem like a good idea. It still seems like the neatest
>> way of having a line break.
>>
>>
>>> Related to line breaking, should there be an &nbsp; escape?  Inserting
>>> nbsp literally into files is somewhat annoying for authoring, since
>>> they're indistinguishable from regular spaces.
>>
>> How common would &nbsp; be?
>>
>>
>> On Thu, 10 Feb 2011, Silvia Pfeiffer wrote:
>>>
>>> Further discussions at Google indicate that it would be nice to make
>>> more components optional. Can we have something like this:
>>>
>>>       [[h*:]mm:]ss[.[d[c[m]]]  | s*[.d[c[m]]]
>>>
>>> Examples:
>>>     23  = 23 seconds
>>>     23.2  = 23 sec, 1 decisec
>>>     1:23.45   = 1 min, 23 sec, 45 centisec
>>>     123.456  = 123 sec, 456 millisec
>>
>> Currently the syntax is [h*:]mm:ss.sss; what's the advantage of making
>> this more complicated? It's not like most subtitled clips will be shorter
>> than a minute. Also, why would we want to support multiple redundant ways
>> of expressing the same time? (e.g. 01:00.000 and 60.000)
>>
>> Readability of VTT files seems like it would be helped by consistency,
>> which suggests using the same format everywhere, as much as possible.
>>
>>
>> On Sun, 16 Jan 2011, Mark Watson wrote:
>>>
>>> I have been looking at how the video element might work in an adaptive
>>> streaming context where the available media are specified with some kind
>>> of manifest file (e.g. MPEG DASH Media Presentation Description) rather
>>> than in HTML.
>>>
>>> In this context there may be choices available as to what to present,
>>> many but not all related to accessibility:
>>>
>>> - multiple audio languages
>>> - text tracks in multiple languages
>>> - audio description of video
>>> - video with open captions (in various languages)
>>> - video with sign language
>>> - audio with directors commentary
>>> - etc.
>>>
>>> It seems natural that for text tracks, loading the manifest could cause
>>> the video element to be populated with associated <track> elements,
>>> allowing the application to discover the choices and activate/deactivate
>>> the tracks.
>>
>> Not literal <track> elements, hopefully, but in-band text tracks (known as
>> "media-resource-specific text track" in the spec).
>>
>>
>>> But this seems just for text tracks. I know discussions are underway on
>>> what to do for other media types, but my question is whether it would be
>>> better to have a consistent solution for selection amongst the available
>>> media that applies for all media types ?
>>
>> They're pretty different from each other, so I don't know that one
>> solution would make sense for all of these.
>>
>> Does the current solution (the videoTracks, audioTracks, and textTracks
>> attributes) adequately address your concern?
>>
>>
>> On Mon, 17 Jan 2011, Jeroen Wijering wrote:
>>>
>>> We are getting some questions from JW Player users that HTML5 video is
>>> quite wasteful on bandwidth for longer videos (think 10min+). This
>>> because browsers download the entire movie once playback starts,
>>> regardless of whether a user pauses the player. If throttling is used,
>>> it seems very conservative, which means a lot of unwatched video is in
>>> the buffer when a user unloads a video.
>>>
>>> I did a simple test with a 10 minute video: playing it; pausing after 30
>>> seconds and checking download progress after another 30 seconds. With
>>> all browsers (Firefox 4, Safari 5, Chrome 8, Opera 11, iOS 4.2), the
>>> video would indeed be fully downloaded after 60 seconds. Some throttling
>>> seems to be applied by Safari / iOS, but this could also be bandwidth
>>> fluctuations on my side. Either way, all browsers downloaded the 10min
>>> video while only 30 seconds were being watched.
>>>
>>> The HTML5 spec is a bit generic on this topic, allowing mechanisms such
>>> as stalling and throttling but not requiring them, or prescribing a
>>> scripting interface:
>>>
>>> http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#concept-media-load-resource
>>
>> Right, this is an area that is left up to implementations; a quality of
>> implementation issue.
>>
>>
>>> A suggestion would be to implement / expose a property called
>>> "downloadBufferTarget". It would be the amount of video in seconds the
>>> browser tries to keep in the download buffer.
>>
>> Wouldn't this be very situation-specific? e.g. if I know I'm about to go
>> into a tunnel for five minutes, I want five minutes of buffered data. If
>> my connection has a high packet loss rate and could stall for upwards of
>> 10 seconds, I want way more than 10 seconds in my buffer. If my connection
>> is such that I can't download data in realtime, I want the whole video in
>> my buffer. If my connection is such that I have 8ms latency to the video
>> server and enough bandwidth to transfer the whole four hour file in 3
>> seconds, then really I don't need anything in my buffer.
>>
>>
>> On Mon, 17 Jan 2011, Roger H錱ensen wrote:
>>> On 2011-01-17 18:36, Markus Ernst wrote:
>>> >
>>> > Could this be done at the user side, e.g. with some browser setting?
>>> > Or even by a "stop downloading" control in the player? An intuitive
>>> > user control would be separate stop and pause buttons, as we know them
>>> > from tape and CD players. Pause would then behave as it does now,
>>> > while stop would cancel downloading.
>>>
>>> I think that's the right way to do it, this should be in the hands of
>>> the user and exposed as a preference in the browsers.
>>
>> Agreed.
>>
>>
>>> Although exposing (read only?) the user's preferred buffer setting to
>>> the HTML App/Plugin etc. would be a benefit I guess as the desired
>>> buffering could be communicated back to the streaming server for example
>>> for a better bandwidth utilization.
>>
>> How would the information be used?
>>
>>
>> On Mon, 17 Jan 2011, Zachary Ozer wrote:
>>>
>>> What no one has mentioned so far is that the real issue isn't the
>>> network utilization or the memory capacity of the devices, it's
>>> bandwidth cost.
>>>
>>> The big issue for publishers is that they're incurring higher costs when
>>> using the <video> tag, which is a disincentive for adoption.
>>>
>>> Since there are situations where both the publisher and the user are
>>> potentially incurring bandwidth costs (or have other limitations), we
>>> could allow the publisher to specify downloadBufferTarget and the user
>>> to specify a setting in the browser's config. The browser would then
>>> actually buffer min(user setting, downloadBufferTarget). At that point
>>> there would probably need to be another read-only property that
>>> specified what value the browser is currently using as it's buffer
>>> length, but maybe the getter for downloadBufferTarget is sufficient.
>>
>> I think before we get something that elaborate set up, we should just try
>> getting preload="" implemented. :-) That might be sufficent.
>>
>>
>> On Tue, 18 Jan 2011, Robert O'Callahan wrote:
>>>
>>> One solution that could work here is to honour dynamic changes to
>>> 'preload', so switching preload to 'none' would stop buffering. Then a
>>> script could do that, for example, after the user has paused the video
>>> for ten seconds. The script could also look at 'buffered' to make its
>>> decision.
>>
>> If browsers want to do that I'm quite happy to add something explicitly to
>> that effect to the spec. Right now the spec doesn't disallow it.
>>
>>
>> On Wed, 19 Jan 2011, Philip J盲genstedt wrote:
>>>
>>> The only difference between preload=none and preload=metadata is how
>>> much is fetched if the user doesn't interact at all with the video. Once
>>> the user has begun playing, I think the two mean the same thing: "please
>>> don't waste my bandwidth more than necessary". In other words, I think
>>> that for preload=metadata, browsers should be somewhat conservative even
>>> after playback has begun, not going all the way to the preload=auto
>>> behavior.
>>
>> The descriptions are somewhat loose, but something like this could work,
>> yes. (Though I'd say after playing preload=metadata and preload=auto are
>> the same and preload=none is the one that says to avoid bandwidth usage,
>> but that's just an artifact of the way I wrote the descriptions.)
>>
>>
>> On Tue, 18 Jan 2011, Zachary Ozer wrote:
>>>
>>> Currently, there's no way to stop / limit the browser from buffering -
>>> once you hit play, you start downloading and don't stop until the
>>> resource is completely loaded. This is largely the same as Flash, save
>>> the fact that some browsers don't respect the preload attribute. (Side
>>> note: I also haven't found a browser that stops loading the resource
>>> even if you destroy the video tag.)
>>>
>>> There have been a few suggestions for how to deal with this, but most
>>> have revolved around using downloadBufferTarget - a settable property
>>> that determines how much video to buffer ahead in seconds. Originally,
>>> it was suggested that the content producers should have control over
>>> this, but most seem to favor the client retaining some control since
>>> they are the most likely to be in low bandwidth situations. (Publishers
>>> who want strict bandwidth control could use a more advanced server and
>>> communication layer ala YouTube).
>>>
>>> The simplest enhancement would be to honor the downloadBufferTarget only
>>> when readyState=HAVE_ENOUGH_DATA and playback is paused, as this would
>>> imply that there is not a low bandwidth situation.
>>
>> It seems the simplest enhancement would be to have the browsers do the
>> right thing (e.g. download enough to get to HAVE_ENOUGH_DATA and stop if
>> the video is paused, or some such), not to add a feature that all Web
>> authors would have to handle.
>>
>>
>> On Tue, 18 Jan 2011, Boris Zbarsky wrote:
>>>
>>> In general, depending on finalizers to release resources (which is
>>> what's happening here) is not really a workable setup.  Maybe we need an
>>> api to explicitly release the data on an audio/video tag?
>>
>> The spec suggests removing the element's src="" attribute and <source>
>> elements and then calling the element's load() method.
>>
>> The spec also suggests that implementors release all resources used by a
>> media element when that media element is an orphan when the event loop
>> spins.
>>
>> See the "Best practices for authors using media elements" and "Best
>> practices for implementors of media elements" sections.
>>
>>
>> On Wed, 19 Jan 2011, Andy Berkheimer wrote:
>>>
>>> In the case where the viewer does not have enough bandwidth to stream
>>> the video in realtime, there are two basic options for the experience:
>>> - buffer the majority of the video (per Glenn and Boris' discussion)
>>> - switch to a lower bitrate that can be streamed in realtime
>>>
>>> This thread has focused primarily of the first option and this is an
>>> experience that we see quite a bit.  This is the option favored amongst
>>> enthusiasts and power users, and also makes sense when a viewer has made
>>> a purchase with an expectation of quality.  And there's always the
>>> possibility that the user does not have enough bandwidth for even the
>>> lowest available bitrate.
>>>
>>> But the second option is the experience that the majority of our viewers
>>> expect.
>>>
>>> The ideal interface would have a reasonable default behavior but give an
>>> application the ability to implement either experience depending on user
>>> preference (or lack thereof), viewing context, etc.
>>
>> Agreed. This is the kind of thing that a good streaming protocol can
>> negotiate in realtime.
>>
>>
>>> I believe Chrome's current implementation _does_ stall the HTTP
>>> connection (stop reading from the socket interface but keep it open)
>>> after some amount of readahead - a magic hardcoded constant. We've run
>>> into issues there - their browser readahead buffer is too small and
>>> causing a lot of underruns.
>>
>> It's early days. File bugs!
>>
>>
>>> No matter how much data you pass between client and server, there's
>>> always some useful playback state that the client knows and the server
>>> does not - or the server's view of the state is stale.  This is
>>> particularly true if there's an HTTP proxy between the user agent and
>>> the server.  Any behavior that could be implemented through an advanced
>>> server/communication layer can be achieved in a simpler, more robust
>>> fashion with a solid buffer management implementation that provides
>>> "advanced" control through javascript and attributes.
>>
>> The main difference is that a protocol will typically be implemented a few
>> times by experienced programmers writing servers and clients, which will
>> then be deployed and used by less experienced (in this kind of thing) Web
>> developers, while if we just expose it to JavaScript, the people
>> implementing it will be a combination of experienced library authors and
>> those same Web developers, and the result will likely be less successful.
>>
>> However, the two aren't mutually exclusive. We could do one and then later
>> (or at the same time) do the other.
>>
>>
>> On Tue, 18 Jan 2011, Roger H氓gensen wrote:
>>>
>>> It may sound odd but in low storage space situations, it may be
>>> necessary to unbuffer what has been played. Is this supported at all
>>> currently?
>>
>> Yes.
>>
>>
>>> I think that the buffering should basically be a "moving window" (I hope
>>> most here are familiar with this term?), and that the size of the moving
>>> window should be determined by storage space and bandwidth and browser
>>> preference and server preference, plus make sure the window supports
>>> skipping anywhere without needing to buffer up to it, and avoid
>>> buffering from the start just because the user skipped back a little to
>>> catch something they missed (another annoyance). This is the only
>>> logical way to do this really. Especially since HTTP 1.1 has byterange
>>> support there is nothing preventing it from being implemented, and I
>>> assume other popular streaming protocols supports byterange as well?
>>
>> Implementations are allowed to do that.
>>
>>
>> On Tue, 18 Jan 2011, Silvia Pfeiffer wrote:
>>>
>>> I think that's indeed one obvious improvement, i.e. when going to pause
>>> stat, stop buffering when readyState=HAVE_ENOUGH_DATA (i.e. we have
>>> reached canplaythrough state).
>>
>> The spec allows this already.
>>
>>
>>> However, again, I don't think that's sufficient. Because we will also
>>> buffer during playback and it is possible that we buffer fast enough to
>>> have buffered e.g. the whole of a 10min video by the time we hit pause
>>> after 1 min and stop watching. That's far beyond canplaythrough and
>>> that's 9min worth of video download wasted bandwidth. This is where the
>>> suggested downloadBufferTarget would make sense. It would basically
>>> specify how much more to download beyond HAVE_ENOUGH_DATA before pausing
>>> the download.
>>
>> I don't understand how a site can know what the right value is for this.
>> Users aren't going to understand that they have to control the buffering
>> if (e.g.) they're about to go into a tunnel and they want to make sure
>> it's buffered all the way through. It should just work, IMHO.
>>
>>
>> On Tue, 18 Jan 2011, David Singer wrote:
>>>
>>> If you want a more tightly coupled supply/consume protocol, then use
>>> one.  As long as it's implemented by client and server, you're on.
>>>
>>> Note that the current move of the web towards download in general and
>>> HTTP in particular is due in no small part to the fact that getting more
>>> tightly coupled protocols -- actually, any protocol other than HTTP --
>>> out of content servers, across firewalls, through NATs, and into clients
>>> is...still a nightmare.  So, we've been given a strong incentive by all
>>> those to use HTTP.  It's sad that some of them are not happy with that
>>> result, but it's going to be hard to change now.
>>
>> Agreed, though in practice there are certainly ways to get two-way
>> protocols through. WebSocket does a pretty good job, for example. But
>> designing a protocol for this is out of scope for this list, really.
>>
>>
>> On Tue, 18 Jan 2011, David Singer wrote:
>>>
>>> In RTSP-controlled RTP, there is a tight relationship between the play
>>> point, and play state, the protocol state (delivering data or paused)
>>> and the data delivered (it is delivered in precisely real-time, and
>>> played and discarded shortly after playing).  The server delivers very
>>> little more data than is actually watched.
>>>
>>> In HTTP, however, the entire resource is offered to the client, and
>>> there is no protocol to convey play/paused back to the server, and the
>>> typical behavior when offered a resource in HTTP is to make a simple
>>> binary decision to either load it (all) or not load it (at all).  So, by
>>> providing a media resource over HTTP, the server should kinda be
>>> expecting this 'download' behavior.
>>>
>>> Not only that, but if my client downloads as much as possible as soon as
>>> possible and caches as much as possible, and yours downloads as little
>>> as possible as late as possible, you may get brownie points from the
>>> server owner, but I get brownie points from my local user -- the person
>>> I want to please if I am a browser vendor.  There is every incentive to
>>> be resilient and 'burn' bandwidth to achieve a better user experience.
>>>
>>> Servers are at liberty to apply a 'throttle' to the supply, of course
>>> ("download as fast as you like at first, but after a while I'll only
>>> supply at roughly the media rate").  They can suggest that the client be
>>> a little less aggressive in buffering, but it's easily ignored and the
>>> incentive is to ignore it.
>>>
>>> So I tend to return to "if you want more tightly-coupled behavior, use a
>>> more tightly-coupled protocol"...
>>
>> Indeed.
>>
>>
>> On Wed, 19 Jan 2011, Philip J盲genstedt wrote:
>>>
>>> The 3 preload states imply 3 simple buffering strategies:
>>>
>>> none: don't touch the network at all
>>> preload: buffer as little as possible while still reaching readyState
>>> HAVE_METADATA
>>> auto: buffer as fast and much as possible
>>
>> "auto" isn't "as fast and much as possible", it's "as fast and much as
>> will make the user happy". In some configurations, it might be the same as
>> "none" (e.g. if the user is paying by the byte and hates video).
>>
>>
>>> However, the state we're discussing is when the user has begun playing the
>>> video. The spec doesn't talk about it, but I call it:
>>>
>>> invoked: buffer as little as possible without readyState dropping below
>>> HAVE_FUTURE_DATA (in other words: being able to play from currentTime to
>>> duration at playbackRate without waiting for the network)
>>
>> There's also a fifth state, let's call it "aggressive", where even while
>> playing the video the UA is trying to download the whole thing in case the
>> connection drops.
>>
>>
>>> If the available bandwidth exceeds the bandwidth of the resource, some
>>> kind of throttling must eventually be used. There are mainly 2 options
>>> for doing this:
>>>
>>> 1. Throttle at the TCP level by not reading data from the socket (not at all
>>> to suspend, or at a controlled rate to buffer ahead)
>>> 2. Use HTTP byte ranges, making many smaller requests with any kind of
>>> throttling at the TCP level
>>
>> There's also option 3, to handle the fifth state above: don't throttle.
>>
>>
>>> When HTTP byte ranges are used to achieve bandwidth management, it's
>>> hard to talk about a single downloadBufferTarget that is the number of
>>> seconds buffered ahead. Rather, there might be an upper and lower limit
>>> within which the browser tries to stay, so that each request can be of a
>>> reasonable size. Neither an author-provided minumum or maximum value can
>>> be followed particularly closely, but could possibly be taken as a hint
>>> of some sort.
>>
>> Would it be a more useful hint than "preload"? I'm skeptical about adding
>> many hints with no requirements. If there's some specific further
>> information we can add, though, it might make sense to add more features
>> to "preload".
>>
>>
>>> The above buffering strategies are still not enough, because users seem
>>> to expect that in a low-bandwidth situation, the video will keep
>>> buffering until they can watch it through to the end. These seem to be
>>> the options for solving the problem:
>>>
>>> * Make sites that want this behavior set .preload='auto' in the 'paused'
>>> event handler
>>>
>>> * Add an option in the context menu to "Preload Video" or some such
>>>
>>> * Cause an invoked (see dfn above) but paused video to behave like
>>> preload=auto
>>>
>>> * As above, but only when the available bandwidth is limited
>>>
>>> I don't think any of these solutions are particularly good, so any input
>>> on other options is very welcome!
>>
>> If users expect something, it seems logical that it should just happen. I
>> don't have a problem with saying that it should depend on preload="",
>> though. If you like I can make the spec explicitly describe what the
>> preload="" hints mean while video is playing, too.
>>
>>
>> On Wed, 19 Jan 2011, Zachary Ozer wrote:
>>>
>>> What if, instead of trying to solve this problem, we leave it up to the
>>> publishers. The current behavior would be unchanged, but we could add
>>> explicit bandwidth management API calls, ie startBuffer() and
>>> stopBuffer(). This would let developers / site publishers control how
>>> much to buffer and when.
>>
>> We couldn't depend on it (most people presumably won't want to do anything
>> but give the src="" of their video).
>>
>>
>>> We might also consider leaning on users a bit to tell us what they want.
>>> For example, I think people are pretty used to hitting play and then
>>> pause to buffer until the end of the video. What if we just used our
>>> bandwidth heuristics while in the play state, and buffered blindly when
>>> a pause occurs less than X seconds into a video? I won't argue that this
>>> is a wonderful solution (or a habit we should encourage), but I figured
>>> I'd throw a random idea out there鈥�
>> That seems like pretty ugly UI. :-)
>>
>>
>> On Thu, 20 Jan 2011, Glenn Maynard wrote:
>>>
>>> I think that pausing shouldn't affect read-ahead buffering behavior.
>>> I'd suggest another preload value, preload=buffer, sitting between
>>> "metadata" and "auto".  In addition to everything loaded by "metadata",
>>> it also fills the read-ahead buffer (whether the video is playing or
>>> not).
>>>
>>> - If a page wants prebuffering only (not full preloading), it sets
>>> preload=buffer.  This can be done even when the video is paused, so when
>>> the user presses play, the video starts instantly without pausing for a
>>> server round-trip like preload=metadata.
>>
>> So this would be to buffer enough to play through assuming the network
>> remains at the current bandwidth, but no more?
>>
>>
>>> - If a page wants prebuffering while playing, but unlimited buffering when
>>> paused (per Zachary's suggestion), it sets preload=buffer when playing and
>>> preload=auto when paused.
>>
>> Again, note that "auto" doesn't mean "buffer everything", it means "do
>> whatever is best for the user".
>>
>> I don't mind adding new values if the browser vendors are going to use
>> them.
>>
>>
>> On Sat, 22 Jan 2011, David Singer wrote:
>>>
>>> When the HTML5 states were first proposed, I went through a careful
>>> exercise to make sure that they were reasonably delivery-technology
>>> neutral, i.e. that they applied equally well if say RTSP/RTP was used,
>>> some kind of dynamic streaming, simple HTTP, and so on.
>>>
>>> I am concerned that we all tend to assume that HTML==HTTP, but the
>>> source URL for the media might have any protocol type, and the HTML
>>> attributes, states etc. should apply (or clearly not apply) to anything.
>>>
>>> Assuming only HTTP, in the markup, is not a good direction.
>>
>> Agreed.
>>
>>
>> On Thu, 20 Jan 2011, Matthew Gregan wrote:
>>>
>>> The media seek algorithm (4.8.10.9) states that the current playback
>>> position should be set to the new playback position during the
>>> asynchronous part of the algorithm, just before the seeking event is
>>> fired. [...]
>>
>> On Thu, 20 Jan 2011, Philip J盲genstedt wrote:
>>>
>>> There have been two non-trivial changes to the seeking algorithm in the
>>> last year:
>>>
>>> Discussed at http://lists.w3.org/Archives/Public/public-html/2010Feb/0003.html
>>> lead to http://html5.org/r/4868
>>>
>>> Discussed at http://lists.w3.org/Archives/Public/public-html/2010Jul/0217.html
>>> lead to http://html5.org/r/5219
>>
>> Yeah. In particular, sometimes there's no way for the UA to know
>> asynchronously if the seek can be done, which is why the attribute is set
>> after the method returns. It's not ideal, but the alternative is not
>> always implementable.
>>
>>
>>> With that said, it seems like there's nothing that guarantees that the
>>> asynchronous section doesn't start running while the script is still
>>> running.
>>
>> Yeah. It's not ideal, but I don't really see what we can do about it.
>>
>>
>>> It's also odd that currentTime is updated before the seek has actually
>>> been completed, but the reason for this is that the UI should show the
>>> new position.
>>
>> Not just the UI. The current position is what the browser is trying to
>> play; if the current position didn't move, then the browser wouldn't be
>> trying to play it.
>>
>>
>> On Fri, 4 Feb 2011, Matthew Gregan wrote:
>>>
>>> For anyone following along, the behaviour has now been changed in the
>>> Firefox 4 nightly builds.
>>
>> On Mon, 24 Jan 2011, Robert O'Callahan wrote:
>>>
>>> I agree. I think we should change behavior to match author expectations
>>> and the other implementations, and let the spec change to match.
>>
>> How do you handle the cases where it's not possible?
>>
>>
>> If all the browsers can do it, I'm all for going back to having
>> currentTime change synchronosuly.
>>
>>
>> On Sat, 29 Jan 2011, Lubomir Toshev wrote:
>>>
>>> [W]hen the video tag has embedded browser controls displayed and I click
>>> anywhere on the controls, they cause a video tag click event. If I want
>>> to toggle play/pause on video area click, then I cannot do this, because
>>> clicking on the play control button, fires play, then click event fires
>>> for video tag and when I toggle It pauses. So this behavior that every
>>> popular flash player has cannot be achieved. There is no way to
>>> understand that the click.target is the embedded browser controls area.
>>> I think that a nice improvement will be to expose this information, in
>>> the target, that it actually is embedded browser controls. Or clicking
>>> the embedded browser controls should not produce a click event for video
>>> tag. After all browser controls are native and do not have
>>> representation in the DOM. Let me know what do you think about this?
>>
>> On Sat, 29 Jan 2011, Aryeh Gregor wrote:
>>>
>>> Well, to begin with, you could just use your own controls rather than
>>> the browser's built-in controls.  Then you have no problem.  If you're
>>> using the browser's built-in controls, maybe you should stick with the
>>> browser's control conventions throughout, which presumably doesn't
>>> include toggling play/pause on click.
>>>
>>> I'm not sure this is a broad enough problem to warrant exposing the
>>> extra information in the target.  Are there any other use-cases for such
>>> info?
>>
>> On Sun, 30 Jan 2011, Lubomir Toshev wrote:
>>>
>>> To elaborate a bit, I'm a control developer and I have my own custom
>>> controls. But we want to allow for the customer to use the default
>>> browser controls if they want to. This can be done by switching an
>>> option in my jQuery widget - browserControls - true/false. Or through
>>> browser context menu shown by default on right click. So I'm trying to
>>> be flexible enough for the customer.
>>>
>>> I was thinking about this
>>> 1) that adding a transparent overlay over the browser controls
>>> Or
>>> 2) to detect the click position and if it is some pixels away from the
>>> bottom of the video tag
>>>
>>> will fix this, but every browser has different height for its embedded
>>> controls and I should hardcode this height in my code, which is just not
>>> manageable.
>>>
>>> I can always add a limitation when using browser controls, toggle
>>> play/pause on video area click will be turned off, but I want to achieve
>>> similar behavior in all the browsers no matter whether they use embedded
>>> controls or not.
>>>
>>> So I think this tiny click.target thing will be very useful.
>>
>> On Sun, 30 Jan 2011, Glenn Maynard wrote:
>>>
>>> Even as a bad hack it's simply not possible; for example, there's no way
>>> to tell whether a pop-out volume control is open or not.
>>>
>>> I think the primary use case browser controls are meant for is when
>>> scripting isn't available at all.  They aren't very useful when you're
>>> using any kind of scripts with the video.  Another problem, related to
>>> your other post about captioning, is that it's impossible to put
>>> anything between the video and the controls, so your captions will draw
>>> *on top of* browser controls.
>>
>> On Mon, 31 Jan 2011, Simon Pieters wrote:
>>>
>>> See http://lists.w3.org/Archives/Public/public-html/2009Jun/0395.html
>>>
>>> I suggested that the browser would not generate an event at all when
>>> using the native controls. Seemingly there was no reply to Hixie's
>>> request for opinion from other implementors.
>>
>> On Mon, 31 Jan 2011, Glenn Maynard wrote:
>>>
>>> There are other meaningful ways to respond to these events; for example,
>>> to pull its container to the top of the draw order if it's a floating
>>> window. I should be able to capture mousedown on the container to do
>>> this, regardless of content.
>>
>> On Mon, 31 Jan 2011, Simon Pieters wrote:
>>>
>>> How about just suppressing activation events like click?
>>
>> On Mon, 31 Jan 2011, Glenn Maynard wrote:
>>>
>>> That makes more sense than suppressing the entire mousedown/mouseup
>>> events (and keydown, touchstart, etc).
>>>
>>> Also, it means you can completely emulate the event behavior of the
>>> default browser controls with scripts: preventDefault on mousedown to
>>> prevent click events.  That's probably not what you actually want to do,
>>> but it means the default controls aren't doing anything special: their
>>> effect on events can be understood entirely in terms of what scripted
>>> events can already do.
>>
>> On Mon, 31 Jan 2011, Lubomir Toshev wrote:
>>>
>>> I totally agree that events should not be raised, when they originate
>>> from the native browser controls. This would make it much simpler. I
>>> filed the same bug for Opera 11 last week.
>>
>> As with the post Simon cites above, I'm happy to do this kind of thing, if
>> multiple vendors agree that it makes sense. If you would like this to be
>> done, I recommend getting other browser vendors to tell me it sounds good!
>>
>>
>> On Sat, 29 Jan 2011, Lubomir Toshev wrote:
>>>
>>> [V]ideo should expose API for currentFrame, so that when control
>>> developers want to add support for subtitles on their own, to be able to
>>> support formats that display the subtitles according to the current
>>> video frame. This is a limitation to the current design of the video
>>> tag.
>>
>> On Sun, 30 Jan 2011, Lubomir Toshev wrote:
>>>
>>> We were trying to add support for subtitles for our player control that
>>> uses video tag as its base. There are two popular subtitle formats *.srt
>>> which uses currentTime to show the subtitles where they should be. Like
>>> 0:01:00 - 0:01:30 - "What a nice hotel." While the other popular format
>>> is *.sub which uses the currentFrame to show the proper subtitles. Like
>>> {45600}, {45689} - "What a nice hotel". And if I want to add this
>>> support it would be good if video tag exposes currentFrame, so that I
>>> can show properly the subtitles in a span positioned over the video. Now
>>> does it make more sense?
>>>
>>> I know video will have embedded subtitle support, but I think that it
>>> should be flexible enough to allow building such features like the one
>>> above. What do you think? To me this is worth adding because, it should
>>> be really easy to implement?
>>
>> We'll probably add that along with the metrics, when we add those, if
>> there's a strong use case for it. I'm not sure that supporting frame-based
>> subtitles is a good use case though.
>>
>>
>> On Mon, 14 Feb 2011, David Flanagan wrote:
>>>
>>> The draft specification defines 20+ medial event handler IDL attributes
>>> on HTMLElement.  These events are non-bubbling and are always targeted
>>> at <audio> and <video> tags, so I wonder if they wouldn't be better
>>> defined on HTMLMediaElement instead.
>>
>> All event handlers are on HTMLElement, to make implementations easier and
>> to make it the platform simpler.
>>
>>
>> On Tue, 15 Feb 2011, David Flanagan wrote:
>>>
>>> Fair enough, though I do think it will confuse developers who will think
>>> that those media events bubble.  (I'll be documenting them as properties
>>> of HTMLMediaElement).
>>
>> Whether an event bubbles or not is up to the place that dispatches the
>> event, not the place that hears the event.
>>
>>
>>> What about Document and Window?  What's the justification for defining
>>> the media event handler attributes on those objects?
>>
>> Same. It allows the same logic to be used everywhere.
>>
>>
>> On Mon, 14 Feb 2011, Kevin Marks wrote:
>>> On Mon, Feb 14, 2011 at 2:39 PM, Ian Hickson <ian@hixie.ch> wrote:
>>> > On Fri, 19 Nov 2010, Per-Erik Brodin wrote:
>>> > >
>>> > > We are about to start implementing stream.record() and
>>> > > StreamRecorder. The spec currently says that 鈥渢he file must be in
>>> > > a format supported by the user agent for use in audio and video
>>> > > elements鈥�which is a reasonable restriction. However, there is
>>> > > currently no way to set the output format of the resulting File that
>>> > > you get from recorder.stop(). It is unlikely that specifying a
>>> > > default format would be sufficient if you in addition to container
>>> > > formats and codecs consider resolution, color depth, frame rate etc.
>>> > > for video and sample size and rate, number of channels etc. for
>>> > > audio.
>>> > >
>>> > > Perhaps an argument should be added to record() that specifies the
>>> > > output format from StreamRecorder as a MIME type with parameters?
>>> > > Since record() should probably throw when an unsupported type is
>>> > > supplied, it would perhaps be useful to have a canRecordType() or
>>> > > similar to be able to test for supported formats.
>>> >
>>> > I haven't added anything here yet, mostly because I've no idea what to
>>> > add. The ideal situation here is that we have one codec that everyone
>>> > can read and write and so don't need anything, but that may be
>>> > hopelessly optimistic.
>>>
>>> That isn't the ideal, as it locks us into the current state of the art
>>> forever. The ideal is to enable multiple codecs +formats that can be
>>> swapped out over time. That said, uncompressed audio is readily
>>> codifiable, and we could pick a common file format, sample rate,
>>> bitdepth and channel caount specification.
>>
>> It doesn't lock us in to one format, we can always add more formats later.
>> Right now, we have zero formats, so one format would be a huge step up.
>>
>>
>> On Fri, 4 Mar 2011, Philip J盲genstedt wrote:
>>> On Thu, 03 Mar 2011 22:15:58 +0100, Aaron Colwell <acolwell@google.com>
>>> wrote:
>>> >
>>> > I was looking at the resource fetch
>>> > algorithm<http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#concept-media-load-resource>section
>>> > and fetching resources
>>> > <http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#fetch>
>>> > sections of the HTML5 spec to determine what the proper behavior is
>>> > for handling redirects. Both YouTube and Vimeo do 302 redirects to
>>> > different hostnames from the URLs specified in the src attribute. It
>>> > looks like the spec says that playback should fail in these cases
>>> > because they are from different origins (Section 2.7 Fetching
>>> > resources bullet 7). This leads me to a few questions.
>>> >
>>> > 1. Is my interpretation of the spec correct? Sample YouTube & Vimeo URLs are
>>> >   shown below.
>>> >   YouTube : src      : http://v22.lscache6.c.youtube.com/videoplayback? ...
>>> >             redirect : http://tc.v22.cache6.c.youtube.com/videoplayback?
>>> > ...
>>> >
>>> >   Vimeo   : src      : http://player.vimeo.com/play_redirect? ...
>>> >             redirect : http://av.vimeo.com/05 ...
>>>
>>> Yes, from what I can tell you're correct, but I think it's not
>>> intentional. The behavior was changed by <http://html5.org/r/5111> in
>>> 2010-06-25, and this is the first time I've noticed it. Opera (and I
>>> assume most if not all other browsers) already supports HTTP redirects
>>> for <video> and I don't think it makes much sense to disallow it. For
>>> security purposes, the origin of the resource is considered to be the
>>> final destination, not any of the origins in the redirect chain.
>>
>> This was fixed recently.
>>
>>
>> On Fri, 18 Mar 2011, Eric Winkelman wrote:
>>>
>>> For in-band metadata tracks, there is neither a standard way to
>>> represent the type of metadata in the HTMLTrackElement interface nor is
>>> there a standard way to represent multiple different types of metadata
>>> tracks.
>>
>> There can be a standard way. The idea is that all the types of metadata
>> tracks that browsers will support should be specified so that all browsers
>> can map them the same way. I'm happy to work with anyone interested in
>> writing such a mapping spec, just let me know.
>>
>>
>>> Proposal:
>>>
>>> For TimedTextTracks with kind=metadata the @label attribute should
>>> contain a MIME type for the metadata and that a track only contain Cues
>>> created from metadata of that MIME type.
>>>
>>> This implies that streams with multiple types of metadata require the
>>> creation of multiple metadata track objects, one for each MIME type.
>>
>> This might make sense if we had a defined way of getting such a MIME type
>> (and assuming you're talking about the IDL attributes, not the content
>> attributes).
>>
>>
>> On Tue, 22 Mar 2011, Eric Winkelman wrote:
>>>
>>> Ah, yes, now I understand the confusion.  Within the whatwg specs, the
>>> word "attribute" is generally used and I was trying to be consistent.
>>
>> The WHATWG specs refer to content attributes (those on elements) and IDL
>> attributes (those on objects, which generate properties in JS). The @foo
>> syntax is never used in the WHATWG specs. It's usually used in a W3C
>> context just to refer to content attributes, by analogy to the XPath
>> syntax. (Personally I prefer foo="" since it's less ambiguous.)
>>
>>
>> On Mon, 21 Mar 2011, Eric Winkelman wrote:
>>>
>>> No, I'm not saying that, but as far as I can tell from the spec, it is
>>> undefined how the user agent should map in-band data to metadata tracks.
>>> I am proposing that the algorithm should be that different types of data
>>> should go into different Timed Text Tracks, and that the track's @label
>>> should reflect the type.
>>
>> To the extent that it is defined, it is defined here:
>>
>>   http://www.whatwg.org/specs/web-apps/current-work/complete.html#sourcing-in-band-text-tracks
>>
>> But the theory, as mentioned above, is that specific types of in-band
>> metadata tracks would have explicit specs written to define how the
>> mapping is done.
>>
>>
>>> Recent updates to the spec, section 4.8.10.12.2
>>> (http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#sourcing-in-band-text-tracks)
>>> appear to address my concern in step 2:
>>>
>>> "2.  Set the new text track's kind, label, and language based on the
>>> semantics of the relevant data, as defined by the relevant
>>> specification."
>>>
>>> Provided that the relevant specification defines the metadata type
>>> encoding to be put in the label, e.g. application/x-eiss,
>>> application/x-scte35, application/x-contentadvisory, etc.
>>
>> Well the problem is that there typically is no applicable specification,
>> or that it is too vague.
>>
>>
>> On Tue, 22 Mar 2011, Lachlan Hunt wrote:
>>>
>>> This is regarding the recently added audioTracks and videoTracks APIs to
>>> the HTMLMediaElement.
>>>
>>> The design of these APIs seems to be done a little strangely, in that
>>> dealing with each track is done by passing an index to each method on
>>> the TrackList interfaces, rather than treating the audioTracks and
>>> videoTracks as collections of individual audio/video track objects. This
>>> design is inconsistent with the design of the TextTrack interface, and
>>> seems sub-optimal.
>>
>> It is intended to avoid an explosion of objects. TextTrack needs to be an
>> object because it has separate state, gets targetted for events, has
>> different versions (e.g. MutableTextTrack), etc. Audio and Video tracks
>> are, on the other hand, rather trivial constructs.
>>
>>
>>> The use of ExclusiveTrackList for videoTracks also seems rather
>>> limiting. What about cases where the second video track is a
>>> sign-language track, or some other video overlay.
>>
>> You use a separate <video> element.
>>
>> I considered this in some depth. The main problem is that you end up
>> having to define a layout mechanism for videos if you allow multiple
>> videos to be enabled from script (e.g. consider what the behaviour should
>> be if you enable the main video, then the PiP sign language video, then
>> disable the main video. What is the intrinsic dimension of the <video>
>> element? Does it matter if you do it in a different order?).
>>
>> By making <video> be a single video's output layer, we can bypass many of
>> these problems without removing expressibility (the author can still
>> support multiple PiP videos).
>>
>>
>>> There are also the use cases for controlling the volume of individual
>>> tracks that are not addressed by the current spec design.
>>
>> Can you elaborate on these use cases?
>>
>> My assumption has been that on the long term, i you want to manipulate
>> specific audio tracks, you would use an <audio> element and plug it into
>> the Audio API for separate processing.
>>
>>
>> On Sat, 2 Apr 2011, Bruce Lawson wrote:
>>>
>>> From a comment in a blog post of mine about longdesc
>>> (http://www.brucelawson.co.uk/2011/longdesc-in-html5/comment-page-1/#comment-749853)
>>> I'm wondering if this is an appropriate used of <details>
>>>
>>> <details>
>>>   <summary>
>>>   <img src=chart.png alt="Graph of percentage of total U.S.
>>> non-institutionalized population age 16-64 declaring one or more
>>> disabilities">
>>>   </summary>
>>> <p>The bar graph shows the percentage of total U.S. noninsitutionalized
>>> population age 16-64 declaring one or more disabilities. The percentage
>>> value for each category is as follows:</p>
>>>                               <ul>
>>>                                       <li>Total declaring one or more
>>> disabilities: 18.6 percent </li>
>>>                                       <li>Sensory (visual and hearing): 2.3
>>> percent</li>
>>>                                       <li>Physical: 6.2 percent</li>
>>>                                       <li>Mental: 3.8 percent</li>
>>>                                       <li>Self-care: 1.8 percent</li>
>>>                                       <li>Diffuculty going outside the home:
>>> 6.4 percent</li>
>>>                                       <li>Employment disability: 11.9
>>> percent</li>
>>>                               </ul>
>>>                               <p>data retrieved from <a
>>> href="http://www.census.gov/prod/2003pubs/c2kbr-17.pdf" title="Link to
>>> External Site" class="external">2000 U.S. Census<span> -
>>>          external link</span></a></p>
>>> </details>
>>>
>>> .. thereby acting as a discoverable-by-anyone longdesc. (The example is
>>> adapted from the longdesc example at
>>> http://webaim.org/techniques/images/longdesc#longdesc)
>>>
>>> Note to grumpy people: I'm not trying to advocate abolishing longdesc,
>>> just seeeing whether details can be used as an alternative.
>>
>> It's a bit weird, but sure.
>>
>> (Well, except for your alt="" text, which is a title="", not an alt="".)
>>
>>
>> On Sat, 2 Apr 2011, John Foliot wrote:
>>>
>>> Interesting question. Referring to the spec, I think that you may have
>>> in fact uncovered a bug in the text. The spec states:
>>>
>>>       "The user agent should allow the user to request that the details
>>> be shown or hidden."
>>>
>>> The problem (or potential problem) here is that the behaviour is defined
>>> in visual terms -
>>
>> The spec explicitly says that these terms have non-visual meaning.
>>
>>
>> On Mon, 4 Apr 2011, Bjartur Thorlacius wrote:
>>>
>>> IMO, the specification of the <details> element is overly focused on
>>> expected renderings. Rather than explicitly defining the semantics of
>>> <details> with or without an @open attribute, and with or without a
>>> <summary> child, sane renderings for medium to large displays whith whom
>>> the user can interact are described, and usage is to be inferred
>>> therefrom. This is suboptimal, as it allows hiding <details open>s on
>>> small output windows but shoulds against it as strongly as ignoring
>>> addition of the open attribute. Note that the <details> element
>>> represents a disclosure widget, but the contents are nowhere defined
>>> (neither as additional information (that a user-agent may or may not
>>> render, depending on factors such as scarcity of screen estate), nor as
>>> spoiling information that shouldn't be provided to the user without
>>> explicit consent). I regard the two different use cases as different,
>>> even though vendors might implement both with { binding: details; } on
>>> some media. <Details> can't serve both. It's often spoken of as if
>>> intended for something else than the YouTube video description use case.
>>> <Details> mustn't be used for hiding spoilers, or else browsers won't be
>>> able to intelligently choose to render the would-be concealed contents.
>>
>> I've clarified <details> to be better defined in this respect. I hope it
>> addresses your concern.
>>
>>
>> On Fri, 22 Apr 2011, Dimitri Glazkov wrote:
>>>
>>> I wonder if it makes sense to introduce a set of pseudo-classes on the
>>> video/audio elements, each reflecting a state of the media on the
>>> controls (playing/paused/error/etc.)? Then, we could use just CSS to
>>> style media controls (whether native or custom), and not have to listen
>>> to DOM events just to tweak their appearance.
>>
>> On Sat, 23 Apr 2011, Philip J盲genstedt wrote:
>>>
>>> With a sufficiently large set of pseudo-classes it might be possible to
>>> do *display* most of the interesting state, but how would you *change*
>>> the state without using scripts? Play/pause, seek, volume, etc...
>>
>> On Sat, 23 Apr 2011, Dimitri Glazkov wrote:
>>>
>>> This is not the goal of using pseudo-classes: they just provide you with
>>> a uniform way to react to changes.
>>
>> On Sat, 23 Apr 2011, Philip J盲genstedt wrote:
>>>
>>> In other words, one would still have to rely heavily on scripts to
>>> actually implement custom controls?
>>>
>>> Also, how would one style a progress bar using pseudo-classes? How about
>>> a displaying elapsed/remaining time on the form MM:SS?
>>
>> On Sat, 23 Apr 2011, Dimitri Glazkov wrote:
>>>
>>> I am not in any way trying to invent a magical way to style media
>>> controls entirely in CSS. Just trying to make the job of controls
>>> developers easier and use CSS where it's well... useful? :)
>>
>> On Sat, 23 Apr 2011, Philip J盲genstedt wrote:
>>>
>>> Very well, what specific set pseudo-classes do you think would be
>>> useful?
>>
>> On Sat, 23 Apr 2011, Dimitri Glazkov wrote:
>>>
>>> I can infer what would be useful from WebKit's media controls as a first
>>> stab?
>>
>> On Mon, 25 Apr 2011, Silvia Pfeiffer wrote:
>>>
>>> A markup and CSS example would make things clearer. How do you think it
>>> would look?
>>
>> On Sun, 24 Apr 2011, Dimitri Glazkov wrote:
>>>
>>> Based on WebKit's current media controls, let's start with these pseudo-classes:
>>>
>>> Play state:
>>> - loading
>>> - playing
>>> - streaming
>>> - error
>>>
>>> Capabilities:
>>> - no-audio
>>> - no-video
>>> - has-closed-captioning
>>>
>>> So, to show a status message while the control is loading or streaming
>>> and hide when it's done:
>>>
>>> video -webkit-media-controls-status-display {
>>>     display: none;
>>> }
>>>
>>>
>>> video:loading -webkit-media-controls-status-display, video:streaming
>>> -webkit-media-controls-status-display {
>>>     display: initial;
>>>     ...
>>> }
>>>
>>> Similarly, to hide volume controls when there's no audio:
>>>
>>> video:no-audio -webkit-media-controls-volume-slider-container {
>>>     display: none;
>>> }
>>>
>>> Once I put these pseudo-classes in place for WebKit, a lot of the code in
>>> http://codesearch.google.com/codesearch/p#OAMlx_jo-ck/src/third_party/WebKit/Source/WebCore/html/shadow/MediaControlRootElement.cpp&exact_package=chromium
>>> will go away, being replaced with straight CSS.
>>
>> Sounds to me like a poor man's XBL. I'd much rather see this addressed
>> using a full-on binding solution, since it seems like it would be only a
>> little more complex yet orders of magnitude more powerful.
>>
>>
>> On Fri, 13 May 2011, Narendra Sisodiya wrote:
>>>
>>> What i want is a general purpose synchronize mechanism when resource
>>> like (text, video, graphics, etc) will be played over a general purpose
>>> timer (timeline) with interaction..
>>>
>>> Ex -
>>>
>>>        <resource type="html" src="asd.html" x="50%"  y="50%"  width="10%"
>>> height="10%" z="6" xpath="page1" tIn="5000ms" tOut="9400ms"
>>> inEffect="fadein" outEffect="fadeout" inEffectDur="1000ms"
>>> outEffectDur="3000ms"/>
>>>
>>>        <resource type="html" src="Indian.ogv" x="50%"  y="50%"  width="10%"
>>> height="10%" z="6" xpath="page2" tIn="5000ms" tOut="9400ms"
>>> inEffect="fadein" outEffect="fadeout" inEffectDur="1000ms"
>>> outEffectDur="3000ms"/>
>>
>> Sounds like SMIL. I recommend looking into SMIL and SVG (which includes
>> parts of SMIL).
>>
>>
>> On Fri, 13 May 2011, Philip J盲genstedt wrote:
>>>
>>> Problem:
>>>
>>> <video src="video.webm"></video>
>>> ...
>>> <script>
>>> document.querySelector('video').oncanplay = function() {
>>>  /* will it run? */
>>> };
>>> </script>
>>>
>>> In the above the canplay event can be replaced with many others, like
>>> loadedmetadata and loadeddata. Whether or not the event handler has been
>>> registered by the time the event is fired depends on how fast decoding
>>> is, how fast the network is and how much "..." there is.
>>
>> Yes, if you add an event listener in a task that runs after the task that
>> fires the event could have run, you won't always catch the event.
>>
>> That's just a bug in the JS.
>>
>>
>> On Fri, 13 May 2011, Henri Sivonen wrote:
>>>
>>> <iframe src=foo.html></iframe>
>>> <script>
>>> document.querySelector('iframe').onload = function() {
>>>    /* will it run? */
>>> };
>>> </script>
>>> has the same problem. The solution is using the onload markup attribute
>>> that calls a function declared in an earlier <script>:
>>>
>>> <script>
>>> function iframeLoaded() {
>>>   /* It will run! */
>>> }
>>> </script>
>>> <iframe src=foo.html onload=iframeLoaded()></iframe>
>>
>> Exactly.
>>
>>
>> On Sat, 14 May 2011, Ojan Vafai wrote:
>>>
>>> If someone proposed a workable solution, browser would likely implement
>>> it. I can't think of a backwards-compatible solution to this, so I agree
>>> that developers just need to learn the that this is a bad pattern. I
>>> could imagine browsers logging a warning to the console in these cases,
>>> but I worry that it would fire too much in today's web.
>>
>> Indeed.
>>
>>
>>> It's unfortunate that you need to use an inline event handler instead of
>>> one registered via addEventListener to avoid the race condition.
>>> Exposing something to the platform like jquery's live event handlers (
>>> http://api.jquery.com/live/) could mitigate this problem in practice,
>>> e.g. it would be just as easy or easier to register the event handler
>>> before the element is created.
>>
>> You can also work around it by setting src="" from script after you've
>> used addEventListener, or by checking the state manually after you've
>> added the handler and calling the handler if it is too late (though you
>> have to be aware of the situation where the event is actually already
>> scheduled and you added the listener between the time it was scheduled and
>> the time it fired, so your function really has to be idempotent).
>>
>>
>> On Sun, 15 May 2011, Olli Pettay wrote:
>>>
>>> There is no need to use inline event handler.
>>> One can always add capturing listener to window for example.
>>> window.addEventListener("canplay",
>>>   function(e) {
>>>     if (e.target == document.querySelector('video') {
>>>       // Do something.
>>>     }
>>>   }
>>> , true);
>>> And just do that before the <video> element occurs in the page.
>>> That is simple, IMHO.
>>
>> Indeed, that is another option.
>>
>>
>>> (I wonder why the "Firing a simple event named e" defaults to
>>> non-bubbling. It makes many things harder than they should be.)
>>
>> The default is arbitrary and doesn't affect the platform (since I have
>> to decide with each event whether to use the default or not). Changing the
>> default would make no difference (I'd just have to go to every site that
>> calls the algorithm and switch it from "bubbles" to nothing and nothing to
>> "does not bubble").
>>
>>
>> On Sun, 15 May 2011, Glenn Maynard wrote:
>>>
>>> If a MediaController is being used it's more complicated; there seems to
>>> be no way to query the readyState of a MediaController (almost, but not
>>> quite, the "most recently reported readiness state"), or to get a list
>>> of slaved media elements from a MediaController without searching for
>>> them by hand.
>>
>> If you're scripting the MediaController, the assumption is that you
>> created it so there's no problem. The impled MediaControllers are for the
>> declarative case where you don't need scripting at all.
>>
>>
>> On Mon, 16 May 2011, Simon Pieters wrote:
>>>
>>> The state can have changed before the event has actually fired, since
>>> state changes are sync but the events are queued. So if the script
>>> happens to run in between then func is run twice.
>>
>> That's true.
>>
>>
>> On Mon, 16 May 2011, Remy Sharp wrote:
>>>
>>> Now you're right, whoever pointed out the 7am alarm example, if you
>>> attach the event too late, then you'll miss the boat.  However, it's a
>>> chicken an egg situation.  You don't have the DOM so you can't attach
>>> the event handler, and if you do have the DOM, the damn event has fired
>>> already.
>>>
>>> What's the fix?  Well, the work arounds are certainly viable, again from
>>> an everyman developer point of view:
>>>
>>> 1) Attach higher up, on the window object and listen for the
>>> canplay/loadedmetadata/etc and check the event.target
>>>
>>> 2) Attach an inline event handler (not nice, but will do)
>>>
>>> The fix?  Since ultimately we have exactly the same potential "bug" with
>>> image load events
>>
>> Not just those, also iframes, own document navigation, sockets, XHR,
>> anything that does asynchronous work, in fact.
>>
>>
>>> is to update the specification and make it clear: that depending on the
>>> speed of the connection and decoding, the following "xyz" events can
>>> fire **before** your script runs.  Therefore, here's a couple of work
>>> arounds - or just be aware.
>>
>> I don't really know where to put this that would actually help.
>>
>>
>> On Tue, 17 May 2011, Philip J盲genstedt wrote:
>>>
>>> Still, I don't think just advocacy is any kind of solution. Given that
>>> you (the co-author of an HTML5 book) make certain assumptions about the
>>> outcome of this race condition, it's safe to assume that hoards of web
>>> developers will do the same.
>>>
>>> To target this specific pattern, one hypothetical solution would be to
>>> special-case the first script that attaches event handlers to a <video>
>>> element. After it has run, all events that were already fired before the
>>> script are fired again. However, this seems awfully messy if the script
>>> also observes readyState or networkState. It might also interfere with
>>> browsers that use scripts behind the scenes to implement the native
>>> controls.
>>>
>>> Although a kludge, another solution might be to block events from being fired
>>> until x more bytes of the document have been parsed or it has finished
>>> loading.
>>
>> On Wed, 18 May 2011, Robert O'Callahan wrote:
>>>
>>> For certain kinds of events ("load", the video events, maybe more),
>>> delay the firing of such events until, say, after DOMContentLoaded has
>>> fired. If you're careful you might be able to make this a strict subset
>>> of the behaviors currently allowed by the spec ... i.e. you're
>>> pretending that your frame, image and video loads simply didn't complete
>>> until after DOMContentLoaded fired in the outer page. That would mean
>>> it's compatible with properly-written legacy content ... if there is
>>> any.
>>>
>>> Of course I have no idea whether that approach is actually feasible :-).
>>> It obviously isn't compatible with what browsers currently do, so
>>> authors wouldn't want to rely on it for a long time if ever.
>>
>> These don't seem like workable solutions. We can't delay load events for
>> every image on the Web, surely. Remembering every event that's ever fired
>> for any <img> or <video> just in case a handler is later attached seems a
>> bit intractable, too.
>>
>> This has been a problem since JavaScript was added in the 90s. I find it
>> hard to believe that we have to suddenly fix it now.
>>
>>
>> On Tue, 24 May 2011, Silvia Pfeiffer wrote:
>>>
>>> Ian and I had a brief conversation recently where I mentioned a problem
>>> with extended text descriptions with screen readers (and worse still
>>> with braille devices) and the suggestion was that the "paused for user
>>> interaction" state of a media element may be the solution. I would like
>>> to pick this up and discuss in detail how that would work to confirm my
>>> sketchy understanding.
>>>
>>> *The use case:*
>>>
>>> In the specification for media elements we have a <track> kind of
>>> "descriptions", which are:
>>> "Textual descriptions of the video component of the media resource,
>>> intended for audio synthesis when the visual component is unavailable
>>> (e.g. because the user is interacting with the application without a
>>> screen while driving, or because the user is blind). Synthesized as a
>>> separate audio track."
>>>
>>> I'm for now assuming that the synthesis will be done through a screen
>>> reader and not through the browser itself, thus making the
>>> descriptions available to users as synthesized audio or as braille if
>>> the screen reader is set up for a braille device.
>>>
>>> The textual descriptions are provided as chunks of text with a start
>>> and a end time (so-called "cues"). The cues are processed during video
>>> playback as the video's playback time starts to fall within the time
>>> frame of the cue. Thus, it is expected the that cues are consumed
>>> during the cue's time frame and are not present any more when the end
>>> time of the cue is reached, so they don't conflict with the video's
>>> normal audio.
>>>
>>> However, on many occasions, it is not possible to consume the cue text
>>> in the given time frame. In particular not in the following
>>> situations:
>>>
>>> 1. The screen reader takes longer to read out the cue text than the
>>> cue's time frame provides for. This is particularly the case with long
>>> cue text, but also when the screen reader's reading rate is slower
>>> than what the author of the cue text expected.
>>>
>>> 2. The braille device is used for reading. Since reading braille is
>>> much slower than listening to read-out text, the cue time frame will
>>> invariably be too short.
>>>
>>> 3. The user seeked right into the middle of a cue and thus the time
>>> frame that is available for reading out the cue text is shorter than
>>> the cue author calculated with.
>>>
>>> Correct me if I'm wrong, but it seems that what we need is a way for
>>> the screen reader to pause the video element from continuing to play
>>> while the screen reader is still busy delivering the cue text. (In
>>> a11y talk: what is required is a means to deal with "extended
>>> descriptions", which extend the timeline of the video.) Once it's
>>> finished presenting, it can resume the video element's playback.
>>
>> Is it a requirement that the user be able to use the regular video pause,
>> play, rewind, etc, controls to seek inside the extended descriptions, or
>> should they literally pause the video while playing, with the audio
>> descriptions being controlled by the same UI as the screen reader?
>>
>>
>>> IIUC, a video is "paused for user interaction" basically when the UA has
>>> decided to pause the video without the user asking to pause it (i.e. the
>>> paused attribute is false) and the pausing happened not for network
>>> buffering reasons, but for other reasons. IIUC one concrete situation
>>> where this state is used is when the UA has reached the end of the
>>> resource and is waiting for more data to come (e.g. on a live stream).
>>
>> That latter state is not "paused for user interaction", it's just stalled
>> due to lack of data. The rest is accurate though.
>>
>>
>>> To use "paused for user interaction" for extending descriptions, we need
>>> to introduce a means for the screen reader to tell the UA to pause the
>>> video when it reaches the end of the cue and it's still busy delivering
>>> a cue's text. Then, as it finishes, it will un-pause the video to let it
>>> continue playing.
>>>
>>> To me it sounds like a feasible solution.
>>>
>>> The screen reader could even provide a user setting and a short-cut so a
>>> user can decide that they don't want this pausing to happen or that they
>>> want to move on from the current cue.
>>>
>>> Another advantage of this approach is that e.g. a deaf-blind user could
>>> hook up their braille device such that it will deliver the extended
>>> descriptions and also deliver captions through braille with such
>>> extension pausing happening. (Not sure that such a user would even want
>>> to play the video, but it would be possible.)
>>>
>>> Now, I think there is one problem though (at least as far as I can
>>> tell). Right now, IIUC, screen readers are only passive listeners on the
>>> UA. They don't influence the behaviour of the UA. The accessibility API
>>> is basically only a one-way street from the UA to the AT. I wonder if
>>> that is a major inhibitor of using this approach or whether it's easy
>>> for UAs to overcome this limitation? (Or if such a limitation even
>>> exists - I don't know enough about how AT work...).
>>>
>>> Is that an issue? Are there other issues that I have overlooked?
>>
>> That seems to be entirely an implementation issue.
>>
>> --
>> Ian Hickson               U+1047E                )\._.,--....,'``.    fL
>> http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
>> Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
>
Received on Saturday, 4 June 2011 01:41:33 UTC