- From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Date: Sat, 4 Jun 2011 11:40:39 +1000
- To: www-archive@w3.org
It's there now: http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-June/031916.html S. On Fri, Jun 3, 2011 at 6:21 PM, Silvia Pfeiffer <silviapfeiffer1@gmail.com> wrote: > Seems this mail was not archived at > http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-June/ > Thus forwarding it for archiving. > Regards, > Silvia. > > On Fri, Jun 3, 2011 at 9:28 AM, Ian Hickson <ian@hixie.ch> wrote: >> >> (Note that I have tried to only reply to each suggestion once, so >> subsequent requests for the same feature are not included below.) >> >> (I apologise for the somewhat disorganised state of this e-mail. I >> normally try to group topics together, but the threads I'm responding to >> here jumped back and forth across different issues quite haphazardly and >> trying to put related things together broke some of the flow and context >> of the discussions, so I opted in several places to leave the context as >> it was originally presented, and just jump back and forth amongst the >> topics raised. Hopefully it's not too confusing.) >> >> On Thu, 9 Dec 2010, Silvia Pfeiffer wrote: >>> >> > > >>> >> > > Sure, but this is only a snippet of an actual application. If, >>> >> > > e.g., you want to step through a list of videos (maybe an >>> >> > > automated playlist) using script and you need to provide at least >>> >> > > two different formats with <source>, you'd want to run this >>> >> > > algorithm frequently. >>> >> > >>> >> > Just have a bunch of <video>s in the markup, and when one ends, >>> >> > hide it and show the next one. Don't start dynamically manipulating >>> >> > <source> elements, that's just asking for pain. >>> >> > >>> >> > If you really must do it all using script, just use canPlayType and >>> >> > the <video src=""> attribute, don't mess around with <source>. >>> >> >>> >> Thanks for adding that advice. I think it's important to point that >>> >> out. >>> > >>> > I can add it to the spec too if you think that would help. Where would >>> > a good place for it be? >>> >>> There is a note in the <source> element section that reads as follows: >>> "Dynamically modifying a source element and its attribute when the >>> element is already inserted in a video or audio element will have no >>> effect. To change what is playing, either just use the src attribute on >>> the media element directly, or call the load() method on the media >>> element after manipulating the source elements." >>> >>> Maybe you can add some advice there to use canPlayType to identify what >>> type of resource to add in the @src attribute on the media element. >>> Also, you should remove the last half of the second sentence in this >>> note if that is not something we'd like to encourage. >> >> Done. >> >> >> On Wed, 8 Dec 2010, Kevin Marks wrote: >>> >>> One case where posters come back after playback is complete is when >>> there are multiple videos on the page, and only one has playback focus >>> at a time, such as a page of preview movies for longer ones to purchase. >>> >>> In that case, showing the poster again on blur makes sense conceptually. >>> >>> It seems that getting back into the pre-playback state, showing the >>> poster again would make sense in this context. >>> >>> That would imply adding an unload() method that reverted to that state, >>> and could be used to make any cached media data purgeable in favour of >>> another video that is subsequently loaded. >> >> You don't need unload(), you can just use load(). It essentially resets >> the media element. >> >> It's not hugely efficient, but if we find people are trying to do this a >> lot, then we can add a more efficent variant that just resets the poster >> frame state, I guess. (I'd probably call it stop(), though, not unload().) >> >> >> On Thu, 9 Dec 2010, David Singer wrote: >>> >>> I think if you want that effect, you flip what's visible in an area of >>> the page between a playing video, and an image. Relying on the poster >>> is not effective, IMHO. >> >> I don't know, I think it would make semantic sense to have all the videos >> be <video> elements if they're actually going to be played right there. >> >> >> On Thu, 9 Dec 2010, Kevin Marks wrote: >>> >>> I know it's not effective at the moment; it is a common use case. >>> QuickTime had the 'badge' ux for years that hardly anyone took advantage >>> of: >>> >>> http://www.mactech.com/articles/mactech/Vol.16/16.02/Feb00QTToolkit/index.html >>> >>> What we're seeing on the web is a converged implementation of the >>> YouTube-like overlaid grey play button, but this is effectively >>> reimplemented independently by each video site that enables embedding. >>> >>> As we see HTML used declaratively for long-form works like ebooks on >>> lower performance devices, having embedded video that doesn't >>> cumulatively absorb all the memory available is going to be like the old >>> CD-ROM use cases the QT Badge was meant for. >> >> This seems like a presentational issue, for which CSS would be better >> positioned to provide a solution. >> >> >> On Thu, 9 Dec 2010, Boris Zbarsky wrote: >>> On 12/8/10 8:19 PM, Ian Hickson wrote: >>> > Boris wrote: >>> > > You can't sniff in a toplevel browser window. Not the same way that >>> > > people are sniffing in <video>. It would break the web. >>> > >>> > How so? >>> >>> People actually rely on the not-sniffing behavior of UAs in actual >>> browser windows in some cases. For example, application/octet-stream at >>> toplevel is somewhat commonly used to force downloads without a >>> corresponding Content-Disposition header (poor practice, but support for >>> Content-Disposition hasn't been historically great either). >>> >>> > (Note that the spec as it stands takes a compromise position: the >>> > content is only accepted if the Content-Type and type="" values are >>> > supported types (if present) and the content sniffs as a supported >>> > type, but nothing in the spec checks that all three values are the >>> > same.) >>> >>> Ah, I see. So similar to the way <img> is handled... >>> >>> I can't quite decide whether this is the best of both worlds, or the >>> worst. ;) >> >> Yeah, I hear ya. >> >> >>> It certainly makes it simpler to implement video by delegating to >>> QuickTime or the like, though I suspect such an implementation would >>> also end up sniffing types the UA doesn't necessarily claim to >>> support.... so maybe it's not simpler after all. >> >> Indeed. >> >> At this point I'm basically just waiting to see what implementations end >> up doing. When I tried moving us more towards sniffing, there was >> pushback; when I tried moving us more towards honouring types, there was >> equal and opposite pushback. So at this point, I'm letting the market >> decide it. :-) >> >> >> On Thu, 9 Dec 2010, Simon Pieters wrote: >>> On Thu, 09 Dec 2010 02:58:12 +0100, Ian Hickson <ian@hixie.ch> wrote: >>> > On Wed, 1 Sep 2010, Simon Pieters wrote: >>> > > >>> > > I think it might be good to run the media element load algorithm >>> > > when setting or changing src on <source> (that has a media element >>> > > as its parent), but not type and media (what's the use case for type >>> > > and media?). However it would fire an 'emptied' event for each >>> > > <source> that changed, which is kind of undesirable. Maybe the media >>> > > element load algorithm should only be invoked if src is set or >>> > > changed on a <source> that has no previous sibling <source> >>> > > elements? >>> > >>> > What's the use case? Just set .src before you insert the element. >>> >>> The use case under discussion is changing to another video. So the >>> element is already inserted and already has src. >>> >>> Something like: >>> >>> <video controls autoplay> >>> <source src=video1.webm type=video/webm> >>> <source src=video1.mp4 type=video/mp4> >>> </video> >>> <script> >>> function loadVideo(src) { >>> var video = document.getElementsByTagName('video')[0]; >>> sources = video.getElementsByTagName('source'); >>> sources[0].src = src + '.webm'; >>> sources[1].src = src + '.mp4'; >>> } >>> </script> >>> <input type="button" value="See video 1" onclick="loadVideo('video1')"> >>> <input type="button" value="See video 2" onclick="loadVideo('video2')"> >>> <input type="button" value="See video 3" onclick="loadVideo('video3')"> >> >> Well if you _really_ want to do that, just call video.load() at the end of >> loadVideo(). But really, you're better off poking around with >> canPlayType() and setting video.src directly instead of using <source> >> for these dynamic cases. >> >> >> On Thu, 9 Dec 2010, Kevin Carle wrote something more or less like: >>> >>> function loadVideo(src) { >>> var video = document.getElementsByTagName('video')[0]; >>> if (video.canPlayType("video/webm") != "") >>> video.src = src + '.webm'; >>> else >>> video.src = src + '.mp4'; >>> } >> >> Yeah. >> >> And hopefully this will become moot when there's a common video format, >> anyway. >> >> >> On Fri, 10 Dec 2010, Simon Pieters wrote: >>> >>> You'd need to remove the <source> elements to keep the document valid. >> >> You don't need them in the first place if you're doing things by script, >> as far as I can tell. >> >> >>> The author might want to have more than two <source>s, maybe with >>> media="", onerror="" etc. Then it becomes simpler to rely on the >>> resource selection algorithm. >> >> It's hard to comment without seeing a concrete use case. >> >> >> On Tue, 14 Dec 2010, Philip J盲genstedt wrote: >>> On Wed, 24 Nov 2010 17:11:02 +0100, Eric Winkelman <E.Winkelman@cablelabs.com> >>> wrote: >>> > >>> > I'm investigating how TimedTracks can be used for in-band-data-tracks >>> > within MPEG transport streams (used for cable television). >>> > >>> > In this format, the number and types of in-band-data-tracks can change >>> > over time. So, for example, when the programming switches from a >>> > football game to a movie, an alternate language track may appear that >>> > wasn't there before. Later, when the programming changes again, that >>> > language track may be removed. >>> > >>> > It's not clear to me how these changes are exposed by the proposed >>> > Media Element events. >>> >>> The thinking is that you switch between different streams by setting the >>> src="" attribute to point to another stream, in which case you'll get an >>> emptied event along with another bunch of events. If you have a single >>> source where audio/video/text streams appear and disappear, there's not >>> really any way to handle it. >> >> As specified, there's no way for a media element's in-band text tracks to >> change after the 'loadedmetadata' event has fired. >> >> >>> > The "loadedmetadata" event is used to indicate that the TimedTracks >>> > are ready, but it appears that it is only fired before playback >>> > begins. Is this event fired again whenever a new track is discovered? >>> > Is there another event that is intended for this situation? >>> > >>> > Similarly, is there an event that indicates when a track has been >>> > removed? Or is this also handled by the "loadedmetadata" event >>> > somehow? >>> >>> No, the loadedmetadata event is only fired once per resource, it's not >>> the event you're looking for. >>> >>> As for actual solutions, I think that having loadedmetadata fire again >>> if the number or type of streams change would make some sense. >> >> It would be helpful to know more about these cases where there are dynamic >> changes to the audio, video, or text tracks. Does this really happen on >> the Web? Do we need to handle it? >> >> >> On Thu, 16 Dec 2010, Silvia Pfeiffer wrote: >>> >>> I do not know how technically the change of stream composition works in >>> MPEG, but in Ogg we have to end a current stream and start a new one to >>> switch compositions. This has been called "sequential multiplexing" or >>> "chaining". In this case, stream setup information is repeated, which >>> would probably lead to creating a new steam handler and possibly a new >>> firing of "loadedmetadata". I am not sure how chaining is implemented in >>> browsers. >> >> Per spec, chaining isn't currently supported. The closest thing I can find >> in the spec to this situation is handling a non-fatal error, which causes >> the unexpected content to be ignored. >> >> >> On Fri, 17 Dec 2010, Eric Winkelman wrote: >>> >>> The short answer for changing stream composition is that there is a >>> Program Map Table (PMT) that is repeated every 100 milliseconds and >>> describes the content of the stream. Depending on the programming, the >>> stream's composition could change entering/exiting every advertisement. >> >> If this is something that browser vendors want to support, I can specify >> how to handle it. Anyone? >> >> >> On Sat, 18 Dec 2010, Robert O'Callahan wrote: >>> >>> http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#dom-media-duration says: >>> [...] >>> >>> What if the duration is not currently known? >> >> The user agent must determine the duration of the media resource before >> playing any part of the media data and before setting readyState to a >> value equal to or greater than HAVE_METADATA, even if doing so requires >> fetching multiple parts of the resource. >> >> >>> I think in general it will be very difficult for a user-agent to know >>> that a stream is unbounded. In Ogg or WebM a stream might not contain an >>> explicit duration but still eventually end. Maybe it would make more >>> sense for the last sentence to read "If the media resource is not known >>> to be bounded, ..." >> >> Done. >> >> >> On Sat, 18 Dec 2010, Philip J盲genstedt wrote: >>> >>> Agreed, this is how I've interpreted the spec already. If a server >>> replies with 200 OK instead of 206 Partial Content and the duration >>> isn't in the header of the resource, then the duration is reported to be >>> Infinity. If the resource eventually ends another durationchange event >>> is fired and the duration is reported to be the (now known) length of >>> the resource. >> >> That's fine. >> >> >> On Mon, 20 Dec 2010, Robert O'Callahan wrote: >>> >>> That sounds good to me. We'll probably do that. The spec will need to be >>> changed though. >> >> I changed it as you suggest above. >> >> >> On Fri, 31 Dec 2010, Bruce Lawson wrote: >>> > On Fri, 5 Nov 2010, Bruce Lawson wrote: >>> > > >>> > > http://www.whatwg.org/specs/web-apps/current-work/complete/video.html#sourcing-in-band-timed-tracks >>> > > says to create TimedTrack objects etc for in-band tracks which are >>> > > then exposed in the API - so captions/subtitles etc that are >>> > > contained in the media container file are exposed, as well as those >>> > > tracks pointed to by the <track> element. >>> > > >>> > > But >>> > > http://www.whatwg.org/specs/web-apps/current-work/complete/video.html#timed-track-api >>> > > implies that the array is only of tracks in the track element: >>> > > >>> > > "media . tracks . length >>> > > >>> > > Returns the number of timed tracks associated with the media element >>> > > (e.g. from track elements). This is the number of timed tracks in >>> > > the media element's list of timed tracks." >>> > >>> > I don't understand why you interpret this as implying anything about >>> > the track element. Are you interpreting "e.g." as "i.e."? >>> > >>> > > Suggestion: amend to say "Returns the number of timed tracks >>> > > associated with the media element (e.g. from track elements and any >>> > > in-band track files inside the media container file)" or some such. >>> > >>> > I'd rather avoid talking about the in-band ones here, in part because >>> > I think it's likely to confuse authors at least as much as help them, >>> > and in part because the terminology around in-band timed tracks is a >>> > little unclear to me and so I'd rather not talk about them in >>> > informative text. :-) >>> > >>> > If you disagree, though, let me know. I can find a way to make it >>> > work. >>> >>> I disagree, but not aggressively vehemently. My confusion was conflating >>> "track elements" with the three instances of the phrase "timed tracks" >>> in close proximity. >>> >>> I suggest that "Returns the number of timed tracks associated with the >>> media element (i.e. from track elements and any packaged along with the >>> media in its container file)" would be clearer and avoid use of the >>> confusing phrase "in-band tracks". >> >> That's still confusing, IMHO. "Packaged" doesn't imply in-band; most >> subtitle files are going to be "packaged" with the video even if they're >> out of band. >> >> Also, your 'i.e.' here is wrong. There's at least one other source of >> tracks: the ones added by the script. >> >> The non-normative text is intentionally not overly precise, because if it >> was precise it would just be the same as the normative text and wouldn't >> be any simpler, defeating its entire purpose. >> >> >> On Mon, 3 Jan 2011, Philip J盲genstedt wrote: >>> > >>> > + I've added a magic string that is required on the format to make it >>> > recognisable in environments with no or unreliable type labeling. >>> >>> Is there a reason it's "WEBVTT FILE" instead of just "WEBVTT"? "FILE" >>> seems redundant and like unnecessary typing to me. >> >> It seemed more likely that non-WebVTT files would start with a line that >> said just "WEBVTT" than a line that said just "WEBVTT FILE". But I guess >> "WEBVTT FILE FORMAT" is just as likely and it'll be caught. >> >> I've changed it to just "WEBVTT"; there may be existing implementations >> that only accept "WEBVTT FILE" so for now I recommend that authors still >> use the longer header. >> >> >>> > On Wed, 8 Sep 2010, Philip J盲genstedt wrote: >>> > > >>> > > In the discussion on public-html-a11y <trackgroup> was suggested to >>> > > group together mutually exclusive tracks, so that enabling one >>> > > automatically disables the others in the same trackgroup. >>> > > >>> > > I guess it's up to the UA how to enable and disable <track>s now, >>> > > but the only option is making them all mutually exclusive (as >>> > > existing players do) or a weird kind of context menu where it's >>> > > possible to enable and disable tracks completely independently. >>> > > Neither options is great, but as a user I would almost certainly >>> > > prefer all tracks being mutually exclusive and requiring scripts to >>> > > enable several at once. >>> > >>> > It's not clear to me what the use case is for having multiple groups >>> > of mutually exclusive tracks. >>> > >>> > The intent of the spec as written was that a browser would by default >>> > just have a list of all the subtitle and caption tracks (the latter >>> > with suitable icons next to them, e.g. the [CC] icon in US locales), >>> > and the user would pick one (or none) from the list. One could easily >>> > imagine a UA allowing the user to enable multiple tracks by having the >>> > user ctrl-click a menu item, though, or some similar solution, much >>> > like with the commonly seen select box UI. >>> >>> In the vast majority of cases, all tracks are intended to be mutually >>> exclusive, such as English+English HoH or subtitles in different >>> languages. No media player UI (hardware or software) that I have ever >>> used allows enabling multiple tracks at once. Without any kind of hint >>> about which tracks make sense to enable together, I can't see desktop >>> Opera allowing multiple tracks (of the same kind) to be enabled via the >>> main UI. >> >> Personally I think it's quite reasonable to want to see two languages at >> once, or even two forms of the same language at once, especially for, >> e.g., reviewing subtitles. But I don't think it would be a bad thing if >> some browsers didn't expose that in the UI; that's something that could >> be left to bookmarklets, for example. >> >> >>> Using this syntax, I would expect some confusion when you omit the closing >>> </v>, when it's *not* a cue spoken by two voices at the same time, such as: >>> >>> <v Jim>- Boo! >>> <v Bob>- Gah! >>> >>> Gah! is spoken by both Jim and Bob, but that was likely not intended. If >>> this causes confusion, we should make validators warn about multiple >>> voices with with no closing </v>. >> >> No need to just warn, the spec says the above is outright invalid, so >> they would raise an error. >> >> >>> > > For captions and subtitles it's less common, but rendering it >>> > > underneath the video rather than on top of it is not uncommon, e.g. >>> > > http://nihseniorhealth.gov/video/promo_qt300.html or >>> > >>> > Conceptually, that's in the video area, it's just that the video isn't >>> > centered vertically. I suppose we could allow UAs to do that pretty >>> > easily, if it's commonly desired. >>> >>> It's already possible to align the video to the top of its content box >>> using <http://dev.w3.org/csswg/css3-images/#object-position>: >>> >>> video { object-position: center top } >>> >>> (This is already supported in Opera, but prefixed: -o-object-position) >> >> Sounds good. >> >> >>> Note that in Sweden captioning for the HoH is delivered via the teletext >>> system, which would allow ASCII-art to be displayed. Still, I've never >>> seen it. The only case of graphics being used in "subtitles" I can >>> remember ever seeing is the DVD of >>> <http://en.wikipedia.org/wiki/Cat_Soup>, where the subtitle system is >>> (ab)used to overlay some graphics. >> >> Yeah, I'm not at all concerned about not supporting graphics in subtitles. >> It's nowhere near the 80% bar. >> >> >>> If we ever want comments, we need to add support in the parser before >>> any content accidentally uses the syntax, in other words pretty soon >>> now. >> >> No, we can use any syntax that the parser currently ignores. It won't >> break backwards compat with content that already uses it by then, since >> the whole point of comments is to be ignored. The only difference is >> whether validators complain or not. >> >> >>> > On Tue, 14 Sep 2010, Anne van Kesteren wrote: >>> > > >>> > > Apart from text/plain I cannot think of a "web" text format that >>> > > does not have comments. >>> > >>> > But what's the use case? Is it really useful to have comments in a >>> > subtitle file? >>> >>> Being able to put licensing/contact information at the top of the file >>> would be useful, just as it is in JavaScript/CSS. >> >> Well the parser explicitly skips over anything in the header block >> (everything up to the first blank line IIRC), so if we find that people >> want this then we can allow it without having to change any UAs except the >> validators. >> >> >>> > On Fri, 22 Oct 2010, Simon Pieters wrote: >>> > > > >>> > > > It can still be inspired by it though so we don't have to change >>> > > > much. I'd be curious to hear what other things you'd clean up >>> > > > given the chance. >>> > > >>> > > WebSRT has a number of quirks to be compatible with SRT, like >>> > > supporting both comma and dot as decimal separators, the weird >>> > > parsing of timestamps, etc. >>> > >>> > I've cleaned the timestamp parsing up. I didn't see others. >>> >>> I consider the cue id line (the line preceding the timing line) to be >>> cruft carried over from SRT. When we now both have classes and the >>> possibility of getting a cue by index, so why do we need it? >> >> It's optional, but it is useful, especially for metadata tracks, as a way >> to grab specific cues. For example, consider a metadata or chapter track >> that contains cues with specific IDs that the site would use to jump to >> particular parts of the video in response to key presses, such as "start >> of content after intro", or maybe for a podcast with different segments, >> where the user can jump to "news" and "reviews" and "final thought" -- you >> need an ID to be able to find the right cue quickly. >> >> >>> > > There was also some discussion about metadata. Language is sometimes >>> > > necessary for the font engine to pick the right glyph. >>> > >>> > Could you elaborate on this? My assumption was that we'd just use CSS, >>> > which doesn't rely on language for this. >>> >>> It's not in any spec that I'm aware of, but some browsers (including >>> Opera) pick different glyphs depending on the language of the text, >>> which really helps when rendering CJK when you have several CJK fonts on >>> the system. Browsers will already know the language from <track >>> srclang>, so this would be for external players. >> >> How is this problem solved in SRT players today? >> >> >> On Mon, 14 Feb 2011, Philip J盲genstedt wrote: >>> >>> Given that most existing subtitle formats don't have any language >>> metadata, I'm a bit skeptical. However, if implementors of non-browser >>> players want to implement WebVTT and ask for this I won't stand in the >>> way (not that I could if I wanted to). For simplicity, I'd prefer the >>> language metadata from the file to not have any effect on browsers >>> though, even if no language is given on <track>. >> >> Indeed. >> >> >> On Tue, 4 Jan 2011, Alex Bishop wrote: >>> >>> Firefox too. If you visit >>> http://people.mozilla.org/~jdaggett/webfonts/serbianglyphs.html in >>> Firefox 4, the text explicitly marked-up as being Serbian Cyrillic >>> (using the lang="sr-Cyrl" attribute) uses some different glyphs to the >>> text with no language metadata. >> >> This seems to be in violation of CSS; we should probably fix it there >> before fixing it in WebVTT since WebVTT relis on CSS. >> >> >> On Mon, 3 Jan 2011, Philip J盲genstedt wrote: >>> >>> > > * The "bad cue" handling is stricter than it should be. After >>> > > collecting an id, the next line must be a timestamp line. Otherwise, >>> > > we skip everything until a blank line, so in the following the >>> > > parser would jump to "bad cue" on line "2" and skip the whole cue. >>> > > >>> > > 1 >>> > > 2 >>> > > 00:00:00.000 --> 00:00:01.000 >>> > > Bla >>> > > >>> > > This doesn't match what most existing SRT parsers do, as they simply >>> > > look for timing lines and ignore everything else. If we really need >>> > > to collect the id instead of ignoring it like everyone else, this >>> > > should be more robust, so that a valid timing line always begins a >>> > > new cue. Personally, I'd prefer if it is simply ignored and that we >>> > > use some form of in-cue markup for styling hooks. >>> > >>> > The IDs are useful for referencing cues from script, so I haven't >>> > removed them. I've also left the parsing as is for when neither the >>> > first nor second line is a timing line, since that gives us a lot of >>> > headroom for future extensions (we can do anything so long as the >>> > second line doesn't start with a timestamp and "-->" and another >>> > timestamp). >>> >>> In the case of feeding future extensions to current parsers, it's way >>> better fallback behavior to simply ignore the unrecognized second line >>> than to discard the entire cue. The current behavior seems unnecessarily >>> strict and makes the parser more complicated than it needs to be. My >>> preference is just ignore anything preceding the timing line, but even >>> if we must have IDs it can still be made simpler and more robust than >>> what is currently spec'ed. >> >> If we just ignore content until we hit a line that happens to look like a >> timing line, then we are much more constrained in what we can do in the >> future. For example, we couldn't introduce a "comment block" syntax, since >> any comment containing a timing line wouldn't be ignored. On the other >> hand if we keep the syntax as it is now, we can introduce a comment block >> just by having its first line include a "-->" but not have it match the >> timestamp syntax, e.g. by having it be "--> COMMENT" or some such. >> >> Looking at the parser more closely, I don't really see how doing anything >> more complex than skipping the block entirely would be simpler than what >> we have now, anyway. >> >> >> On Mon, 3 Jan 2011, Glenn Maynard wrote: >>> >>> By the way, the WebSRT hit from Google >>> (http://www.whatwg.org/specs/web-apps/current-work/websrt.html) is 404. >>> I've had to read it out of the Google cache, since I'm not sure where it >>> went. >> >> I added a redirect. >> >> >>> Inline comments (not just line comments) in subtitles are very important >>> for collaborative editing: for leaving notes about a translation, noting >>> where editing is needed or why a change was made, and so on. >>> >>> If a DOM-like interface is specified for this (presumably this will >>> happen later), being able to access inline comments like DOM comment >>> nodes would be very useful for visual editors, to allow displaying >>> comments and to support features like "seek to next comment". >> >> We can add comments pretty easily (e.g. we could say that "<!" starts a >> comment and ">" ends it -- that's already being ignored by the current >> parser), if people really need them. But are comments really that useful? >> Did SRT have problem due to not supporting inline comments? (Or did it >> support inline comments?) >> >> >> On Tue, 4 Jan 2011, Glenn Maynard wrote: >>> On Tue, Jan 4, 2011 at 4:24 AM, Philip J盲genstedt <philipj@opera.com> >>> wrote: >>> > If you need an intermediary format while editing, you can just use any >>> > syntax you like and have the editor treat it specially. >>> >>> If I'd need to write my own parser to write an editor for it, that's one >>> thing--but I hope I wouldn't need to create yet another ad hoc caption >>> format, mirroring the features of this one, just to work around a lack >>> of inline comments. >> >> An editor would need a custom parser anyway to make sure it round-tripped >> syntax errors, presumably. >> >> >>> The cue text already vaguely resembles HTML. What about <!-- comments >>> -->? It's universally understood, and doesn't require any new escape >>> mechanisms. >> >> The current parser would end a comment at the first ">", but so long as >> you didn't have a ">" in the comment, "<!--...-->" would work fine within >> cue text. (We would have to be careful in standalone blocks to define it >> in such a way that it could not be confused with a timing line.) >> >> >> On Wed, 5 Jan 2011, Philip J盲genstedt wrote: >>> >>> The question is rather if the comments should be exposed as DOM comment >>> nodes in getCueAsHTML, which seems to be what you're asking for. That >>> would only be possible if comments were only allowed inside the cue >>> text, which means that you couldn't comment out entire cues, as such: >>> >>> 00:00.000 --> 00:01.000 >>> one >>> >>> /* >>> 00:02.000 --> 00:03.000 >>> two >>> */ >>> >>> 00:04.000 --> 00:05.000 >>> three >>> >>> Therefore, my thinking is that comments should be removed during parsing >>> and not be exposed to any layer above it. >> >> We can support both, if there's really demand for it. >> >> For example: >> >> 00:00.000 --> 00:01.000 >> one <! inline comment > one >> >> COMMENT--> >> 00:02.000 --> 00:03.000 >> two; this is entirely >> commented out >> >> <! this is the ID line >> 00:04.000 --> 00:05.000 >> three; last line is a ">" >> which is part of the cue >> and is not a comment. >> > >> >> The above would work today in a conforming UA. The question really is what >> parts of this do we want to support and what do we not care enough about. >> >> >> On Wed, 5 Jan 2011, Anne van Kesteren wrote: >>> On Wed, 05 Jan 2011 10:58:56 +0100, Philip J盲genstedt >>> <philipj@opera.com> wrote: >>> > Therefore, my thinking is that comments should be removed during >>> > parsing and not be exposed to any layer above it. >>> >>> CSS does that too. It has not caused problems so far. It does mean >>> editing tools need a slightly different DOM, but that is always the case >>> as they want to preserve whitespace details, etc., too. At least editors >>> that have both a text and visual interface. >> >> Right. >> >> >> On Fri, 14 Jan 2011, Silvia Pfeiffer wrote: >>> >>> We are concerned, however, about the introduction of WebVTT as a >>> universal captioning format *when used outside browsers*. Since a subset >>> of CSS features is required to bring HTML5 video captions on par with TV >>> captions, non-browser applications will need to support these CSS >>> features, too. However, we do not believe that external CSS files are an >>> acceptable solution for non-browser captioning and therefore think that >>> those CSS features (see [1]) should eventually be made part of the >>> WebVTT specification. >>> >>> [1] http://www.whatwg.org/specs/web-apps/current-work/multipage/rendering.html#the-'::cue'-pseudo-element >> >> I'm not sure what you mean by "made part of the WebVTT specification", but >> if you mean that WebVTT should support inline CSS, that does seem line >> something we can add, e.g. using syntax like this: >> >> WEBVTT >> >> STYLE--> >> ::cue(v[voice=Bob]) { color: green; } >> ::cue(c.narration) { font-style: italic; } >> ::cue(c.narration i) { font-style: normal; } >> >> 00:00.000 --> 00:02.000 >> Welcome. >> >> 00:02.500 --> 00:05.000 >> To WebVTT. >> >> I suggest we wait until WebVTT and '::cue' in particular have shipped in >> at least one browser and been demonstrated as being useful before adding >> this kind of feature though. >> >> >>> 1. Introduce file-wide metadata >>> >>> WebVTT requires a structure to add header-style metadata. We are here >>> talking about lists of name-value pairs as typically in use for header >>> information. The metadata can be optional, but we need a defined means >>> of adding them. >>> >>> Required attributes in WebVTT files should be the main language in use >>> and the kind of data found in the WebVTT file - information that is >>> currently provided in the <track> element by the @srclang and @kind >>> attributes. These are necessary to allow the files to be interpreted >>> correctly by non-browser applications, for transcoding or to determine >>> if a file was created as a caption file or something else, in particular >>> the @kind=metadata. @srclang also sets the base directionality for BiDi >>> calculations. >>> >>> Further metadata fields that are typically used by authors to keep >>> specific authoring information or usage hints are necessary, too. As >>> examples of current use see the format of MPlayer mpsub’s header >>> metadata [2], EBU STL’s General Subtitle Information block [3], and >>> even CEA-608’s Extended Data Service with its StartDate, Station, >>> Program, Category and TVRating information [4]. Rather than specifying a >>> specific subset of potential fields we recommend to just have the means >>> to provide name-value pairs and leave it to the negotiation between the >>> author and the publisher which fields they expect of each other. >>> >>> [2] http://www.mplayerhq.hu/DOCS/tech/mpsub.sub >>> [3] https://docs.google.com/viewer?a=v&q=cache:UKnzJubrIh8J:tech.ebu.ch/docs/tech/tech3264.pdf >>> [4] http://edocket.access.gpo.gov/cfr_2007/octqtr/pdf/47cfr15.119.pdf >> >> I don't understand the use cases here. >> >> CSS and JS don't have anything like this, why should WebVTT? What problem >> is this solving? How did SRT solve this problem? >> >> >>> 2. Introduce file-wide cue settings >>> >>> At the moment if authors want to change the default display of cues, >>> they can only set them per cue (with the D:, S:, L:, A: and T:. cue >>> settings) or have to use an external CSS file through a HTML page with >>> the ::cue pseudo-element. In particular when considering that all >>> Asian language files would require a “D:vertical” marker, it becomes >>> obvious that this replication of information in every cue is >>> inefficient and a waste of bandwidth, storage, and application speed. >>> A cue setting default section should be introduced into a file >>> header/setup area of WebVTT which will avoid such replication. >>> >>> An example document with cue setting defaults in the header could look >>> as follows: >>> == >>> WEBVTT >>> Language=zh >>> Kind=Caption >>> CueSettings= A:end D:vertical >>> >>> 00:00:15.000 --> 00:00:17.950 >>> 在左边我们可以看到... >>> >>> 00:00:18.160 --> 00:00:20.080 >>> 在右边我们可以看到... >>> >>> 00:00:20.110 --> 00:00:21.960 >>> ...捕蝇草械. >>> == >>> >>> Note that you might consider that the solution to this problem is to use >>> external CSS to specify a change to all cues. However, this is not >>> acceptable for non-browser applications and therefore not an acceptable >>> solution to this problem. >> >> Adding defaults seems like a reasonable feature. We could add this just by >> adding the ability to have a block in a VTT file like this: >> >> WEBVTT >> >> DEFAULTS --> A:vertical A:end >> >> 00:00.000 --> 00:02.000 >> This is vertical and end-aligned. >> >> 00:02.500 --> 00:05.000 >> As is this. >> >> DEFAULTS --> A:start >> >> 00:05.500 --> 00:07.000 >> This is horizontal and start-aligned. >> >> However, again I suggest that we wait until WebVTT has been deployed in at >> least one browser before adding more features like this. >> >> >>> * positioning: Generally the way in which we need positioning to work is >>> to provide an anchor position for the text and then explain in which >>> direction font size changes and the addition of more text allows the >>> text segment to grow. It seems that the line position cue (L) provides a >>> baseline position and the alignment cue (A) provides the growing >>> direction start/middle/end. Can we just confirm this understanding? >> >> It's more the other way around: the line boxes are laid out and then the >> resulting line boxes are positioned according to the A: and L: lines. In >> particular, the L: lines when given with a % character position the line >> boxes in the same manner that CSS background-position positions the >> background image, and L: lines without a % character set the position of >> the line boxes based on the height of the first line box. A: lines then >> just set the position of these line boxes relative to the other dimension. >> >> >>> * fontsize: When changing text size in relation to the video changing >>> size or resolution, we need to make sure not to reduce the text size >>> below a specific font size for readability reasons. And we also need to >>> make sure not to make it larger than a specific font size, since >>> otherwise it will dominate the display. We usually want the text to be >>> at least Xpx, but no bigger than Ypx. Also, one needs to pay attention >>> to the effect that significant player size changes have on relative >>> positioning - in particular for the minimum caption text size. Dealing >>> with min and max sizes is missing from the current specification in our >>> understanding. >> >> That's a CSS implementation issue. Minimum font sizes are commonly >> supported in CSS implementations. Maximum font sizes would be similar. >> >> >>> * bidi text: In our experience from YouTube, we regularly see captions >>> that contain mixed languages/directionality, such as Hebrew captions >>> that have a word of English in it. How do we allow for bidi text inside >>> cues? How do we change directionality mid-cue? Do we deal with the >>> zero-width LTR-mark and RTL-mark unicode characters? It would be good to >>> explain how these issues are dealt with in WebVTT. >> >> There's nothing special about how they work in WebVTT; they are handled >> the same as in CSS. >> >> >>> * internationalisation: D:vertical and D:vertical-lr seem to only work >>> for vertical text - how about horizontal-rl? For example, Hebrew is a >>> prime example of a language being written from right to left >>> horizontally. Is that supported and how? >> >> What exactly would horizontal-rl do? >> >> >>> * naming: The usage of single letter abbreviations for cue settings has >>> created quite a discussion here at Google. We all agree that file-wide >>> cue settings are required and that this will reduce the need for >>> cue-specific cue settings. We can thus afford a bit more readability in >>> the cue settings. We therefore believe that it would be better if the >>> cue settings were short names rather than single letter codes. This >>> would be more like CSS, too, and easier to learn and get right. In the >>> interface description, the 5 dimensions have proper names which could be >>> re-used (“direction”, “linePosition”, “textPosition”, “size” and >>> “align"). We therefore recommend replacing the single-letter cue >>> commands with these longer names. >> >> That would massively bloat these files and make editing them a huge pain, >> as far as I can tell. I agree that defaults would make it better, but many >> cues would still need their own positioning and sizing information, and >> anything beyond a very few letters would IMHO quickly become far too >> verbose for most people. "L", "A", and "S" are pretty mnemonic, "T" would >> quickly become familiar to people writing cues, and "D" is only going to >> be relevant to some authors but for those authors it's pretty >> self-explanatory as well, since the value is verbose. >> >> What I really would like to do is use "X" and "Y" instead of "T" and "L", >> but those terms would be very confusing when we flip the direction, which >> is why I used the less obvious "T" and "L". >> >> >>> * textcolor: In particular on European TV it is common to distinguish >>> between speakers by giving their speech different colors. The following >>> colors are supported by EBU STL, CEA-608 and CEA-708 and should be >>> supported in WebVTT without the use of external CSS: black, red, green, >>> yellow, blue, magenta, cyan, and white. As default we recommend white on >>> a grey transparent background. >> >> This is supported as 'color' and 'background'. >> >> >>> * underline: EBU STL, CEA-608 and CEA-708 support underlining of >>> characters. >> >> I've added support for 'text-decoration'. >> >> >>> The underline character is also particularly important for some Asian >>> languages. >> >> Could you elaborate on this? >> >> >>> Please make it possible to provide text underlines without the use of >>> CSS in WebVTT. >> >> Why without CSS? >> >> >>> * blink: As much as we would like to discourage blinking subtitles, they >>> are actually a core requirement for EBU STL and CEA-608/708 captions and >>> in use in particular for emergency messages and similar highly important >>> information. Blinking can be considered optional for implementation, but >>> we should allow for it in the standard. >> >> This is part of 'text-decoration'. >> >> >>> * font face: CEA-708 provides a choice of eight font tags: undefined, >>> monospaced serif, proportional serif, monospaced sans serif, >>> proportional sans serif, casual, cursive, small capital. These fonts >>> should be available for WebVTT as well. Is this the case? >> >> Yes. >> >> >>> We are not sure about the best solution to these needs. Would it be best >>> to introduce specific tags for these needs? >> >> CSS seems to handle these needs adequately. >> >> >>> We have a couple of recommendations for changes mostly for aesthetic and >>> efficiency reasons. We would like to point out that Google is very >>> concerned with the dense specification of data and every surplus >>> character, in particular if it is repeated a lot and doesn’t fulfill a >>> need, should be removed to reduce the load created on worldwide >>> networking and storage infrastructures and help render Web pages faster. >> >> This seems to contradict your earlier request to make the languge more >> verbose... >> >> >>> * Time markers: WebVTT time stamps follow no existing standard for time >>> markers. Has the use of NPT as introduced by RTSP[5] for time markers >>> been considered (in particular npt-hhmmss)? >>> >>> [5] http://www.ietf.org/rfc/rfc2326.txt >> >> WebVTT follows the SRT format, with commas replaced by periods for >> consistency with the rest of the platform. >> >> >>> * Suggest dropping “-->”: In the context of HTML, “-->” is an end >>> comment marker. It may confuse Web developers and parsers if such a sign >>> is used as a separator. For example, some translation tools expect HTML >>> or XML-based interchange formats and interpret the “>” as part of a >>> tag. Also, common caption convention often uses “>” to represent >>> speaker identification. Thus it is more difficult to write a filter >>> which correctly escapes “-->” but retains “>” for speaker ID. >> >> "-->" seems pretty mnemonic to me. I don't see why we'd want to drop it. >> >> >>> * Duration specification: WebVTT time stamps are always absolute time >>> stamps calculated in relation to the base time of synchronisation with >>> the media resource. While this is simple to deal with for machines, it >>> is much easier for hand-created captions to deal with relative time >>> stamps for cue end times and for the timestamp markers within cues. Cue >>> start times should continue to stay absolute time stamps. Timestamp >>> markers within cues should be relative to the cue start time. Cue end >>> times should be possible to be specified either as absolute or relative >>> timestamps. The relative time stamps could be specified through a prefix >>> of “+” in front of a “ss.mmm” second and millisecond specification. >>> These are not only simpler to read and author, but are also more compact >>> and therefore create smaller files. >> >> I think if anything is absolute, it doesn't really make anything much >> simpler for anything else to be relative, to be honest. Take the example >> you give here: >> >>> An example document with relative timestamps is: >>> == >>> WEBVTT >>> Language=en >>> Kind=Subtitle >>> >>> 00:00:15.000 +2.950 >>> At the left we can see... >>> >>> 00:00:18.160 +1.920 >>> At the right we can see the... >>> >>> 00:00:20.110 +1.850 >>> ...the <+0.400>head-<+0.800>snarlers >>> == >> >> If the author were to change the first time stamp because the video gained >> a 30 second advertisement at the start, then he would still need to change >> the hundreds of subseqent timestamps for all the additional cues. What >> does the author gain from not having to change the relative stamps? It's >> not like he's going to be doing it by hand, and once a tool is involved, >> the tool can change everything just as easily. >> >> >>> We are happy to see the introduction of the magic file identifier for >>> WebVTT which will make it easier to identify the file format. We do not >>> believe the “FILE” part of the string is necessary. >> >> I have removed it. >> >> >>> However, we recommend to also introduce a format version number that the >>> file adheres to, e.g. “WEBVTT 0.7”. >> >> Version numbers are an antipattern on the Web, so I have not added one. >> >> >>> This helps to make non-browser systems that parse such files become >>> aware of format changes. >> >> The format will never change in a non-backwards-compatible fashion once it >> is deployed, so that is not a concern. >> >> >>> It can also help identify proprietary standard metadata sets as used by >>> a specific company, such as “WEBVTT 0.7 ABC-meta1” which could signify >>> that the file adheres to WEBVTT 0.7 format specification with the >>> ABC-meta1 metadata schema. >> >> If we add metadata, then that can be handled just by having the metadata >> include that information itself. >> >> >>> CEA-708 captions support automatic line wrapping in a more sophisticated >>> way than WebVTT -- see http://en.wikipedia.org/wiki/CEA-708#Word_wrap. >>> >>> In our experience with YouTube we have found that in certain situations >>> this type of automatic line wrapping is very useful. Captions that were >>> authored for display in a full-screen video may contain too many words >>> to be displayed fully within the actual video presentation (note that >>> mobile / desktop / internet TV devices may each have a different amount >>> of space available, and embedded videos may be of arbitrary sizes). >>> Furthermore, user-selected fonts or font sizes may be larger than >>> expected, especially for viewers who need larger print. >>> >>> WebVTT as currently specified wraps text at the edge of their containing >>> blocks, regardless of the value of the 'white-space' property, even if >>> doing so requires splitting a word where there is no line breaking >>> opportunity. This will tend to create poor quality captions. For >>> languages where it makes sense, line wrapping should only be possible at >>> carriage return, space, or hyphen characters, but not on >>> characters. (Note that CEA-708 also contains non-breaking space and >>> non-breaking transparent space characters to help control wrapping.) >>> However, this algorithm will not necessarily work for all languages. >>> >>> We therefore suggest that a better solution for line wrapping would be >>> to use the existing line wrapping algorithms of browsers, which are >>> presumably already language-sensitive. >>> >>> [Note: the YouTube line wrapping algorithm goes even further by >>> splitting single caption cues into multiple cues if there is too much >>> text to reasonably fit within the area. YouTube then adjusts the times >>> of these caption cues so they appear sequentially. Perhaps this could >>> be mentioned as another option for server-side tools.] >> >> I've adjusted the text in the spec to more clearly require that >> line-breaking follow normal CSS rules but with the additional requirement >> that there not be overflow, which is what I had intended. >> >> >>> 1. Pop-on/paint-on/roll-up support >>> >>> Three different types of captions are common on TV: pop-on, roll-up and >>> paint-on. Captions according to CEA-608/708 need to support captions of >>> all three of these types. We believe they are already supported in >>> WebVTT, but see a need to re-confirm. >>> >>> For pop-on captions, a complete caption cue is timed to appear at a >>> certain time and disappear a few seconds later. This is the typical way >>> in which captions are presented and also how WebVTT/<track> works in our >>> understanding. Is this correct? >> >> As far as I understand, yes. >> >> >>> For roll-up captions, individual lines of captions are presented >>> successively with older lines moving up a line to make space for new >>> lines underneath. Assuming we understand the WebVTT rendering rules >>> correctly, WebVTT would identify each of these lines as an individual, >>> but time-overlapping cue with the other cues. As more cues are created >>> and overlap in time, newer cues are added below the currently visible >>> ones and move the currently visible ones up, basically creating a >>> roll-up effect. If this is a correct understanding, then this is an >>> acceptable means of supporting roll-up captions. >> >> I am not aware of anything currently in the WebVTT specification which >> will cause a cue to move after it has been placed on the video, so I do >> not believe this is a correct understanding. >> >> However, you can always have a cue be replaced by a cue with the same text >> but on a higher line, if you're willing to do some preprocessing on the >> subtitle file. It won't be a smoothly animated scroll, but it would work. >> >> If there is convincing evidence that this kind of subtitle is used on the >> Web, though, we can support it more natively. So far I've only seen it in >> legacy scenarios that do not really map to expected WebVTT use cases. >> >> For supporting those legacy scenarios, you need script anyway (to handle, >> e.g., backspace and moving the cursor). If you have script, doing >> scrolling is possible either by moving the cue, or by not using the >> default UA rendering of the cues at all and doing it manually (e.g. using >> <div>s or <canvas>). >> >> >>> Finally, for paint-on captions, individual letters or words are >>> displayed successively on screen. WebVTT supports this functionality >>> with the cue timestamps <xx:xx:xx.xxx>, which allows to specify >>> characters or words to appear with a delay from within a cue. This >>> essentially realizes paint-on captions. Is this correct? >> >> Yes. >> >> >>> (Note that we suggest using relative timestamps inside cues to make this >>> feature more usable.) >> >> It makes it modestly easier to do by hand, but hand-authoring a "paint-on" >> style caption seems like a world of pain regardless of the timestamp >> format we end up using, so I'm not sure it's a good argument for >> complicating the syntax with a second timestamp format. >> >> >>> The HTML spec specifies that it is not allowed to have two tracks that >>> provide the same kind of data for the same language (potentially empty) >>> and for the same label (potentially empty). However, we need >>> clarification on what happens if there is a duplicate track, ie: does >>> the most recent one win or the first one or will both be made available >>> in the UI and JavaScript? >> >> They are both available. >> >> >>> The spec only states that the combination of {kind, type, label} must be >>> unique. It doesn't say what happens if they are not. >> >> Nothing different happens if they are not than if they are. It's just a >> conformance requirement. >> >> >>> Further, the spec says nothing about duplicate labels altogether - what >>> is a browser supposed to do when two tracks have been marked with the >>> same label? >> >> That same as it does if they have different labels. >> >> >>> It is very important that there is a possibility for users to >>> auto-activate tracks. Which track is chosen as the default track to >>> activate depends on the language preferences of the user. The user is >>> assumed to have a list of language preferences which leads this choice. >> >> I've added a "default" attribute so that sites can control this. >> >> >>> In YouTube, if any tracks exist that match the first language >>> preference, the first of those is used as the default. A track with >>> no name sorts ahead of one with a name. The sorting is done according >>> to that language's collation order. In order to override this you >>> would need (1) a default=true attribute for a track which gives it >>> precedence if its language matches, and (2) a way to force the >>> language preference. If no tracks exist for the first language pref, >>> the second language pref is checked, and so on. >>> >>> If the user's language preferences are known, and there are no tracks >>> in that language, you have other options: >>> (1) offer to do auto-translation (or just do it) >>> (2) use a track in the same language that the video's audio is in (if known) >>> (3) if only one track, use the first available track >>> >>> Also make sure the language choice can be overriden by the user >>> through interaction. >>> >>> We’d like to make sure this or a similar algorithm is the recommended >>> way in which browsers deal with caption tracks. >> >> This seems to me to be a user agent quality of implementation issue. User >> preferences almost by definition can't be interoperable, so it's not >> something we can specify. >> >> >>> As far as we understand, you can currently address all cues through >>> ::cue and you can address a cue part through ::cue-part(<voice> || >>> <part> || <position> || <future-compatibility>). However, if we >>> understand correctly, it doesn’t seem to be possible to address an >>> individual cue through CSS, even though cues have individual >>> identifiers. This is either an oversight or a misunderstanding on our >>> parts. Can you please clarify how it is possible to address an >>> individual cue through CSS? >> >> I've made the ID referencable from the ::cue() selector argument as an ID >> on the anonymous root element. >> >> >>> Our experience with automated caption creation and positioning on >>> YouTube indicates that it is almost impossible to always place the >>> captions out of the way of where a user may be interested to look at. We >>> therefore allow users to dynamically move the caption rendering area to >>> a different viewport position to reveal what is underneath. We recommend >>> such drag-and-drop functionality also be made available for TimedTrack >>> captions on the Web, especially when no specific positioning information >>> is provided. >> >> I've added text to explicitly allow this. >> >> >> On Sat, 22 Jan 2011, Philip J盲genstedt wrote: >>> >>> Indeed, repeating settings on each cue would be annoying. However, >>> file-wide settings seems like it would easily be too broad, and you'd >>> have to explicitly reverse the effect on the cues where you don't want >>> it to apply. Maybe classes of cue settings or some kind of macros would >>> work better. >> >> My assumption is that similar cues will typically be grouped together, so >> that one could introduce the group with a "DEFAULTS" block and then >> >> >>> Nitpick: Modern Chinese, including captions, is written left-to-right, >>> top-to-bottom, just like English. >> >> Indeed. I don't expect there will be much vertical text captioning. I >> added it primarily to support some esoteric Anime cases. >> >> >> >>> That the intra-cue timings are relative but the timing lines are >>> absolute has bugged me a bit, so if the distinction was more obvious >>> just from the syntax, that'd be great! >> >> They're all absolute. >> >> >>> [for the file signature] "WebSRT" is prettier than "WEBSRT". >> >> The idea is not to be pretty, the idea is to stand out. :-) >> >> >>> I'm inclined to say that we should normalize all whitespace during >>> parsing and not have explicit line breaks at all. If people really want >>> two lines, they should use two cues. In practice, I don't know how well >>> that would fare, though. What other solutions are there? >> >> I think we definitely need line breaks, e.g. for cases like: >> >> -- Do you want to go to the zoo? >> -- Yes! >> -- Then put your shoes on! >> >> ...which is quite common style in some locales. >> >> However, I agree that we should encourage people to let browsers wrap the >> lines. Not sure how to encourage that more. >> >> >> On Sun, 23 Jan 2011, Glenn Maynard wrote: >>> >>> It should be possible to specify language per-cue, or better, per block >>> of text mid-cue. Subtitles making use of multiple languages are common, >>> and it should be possible to apply proper font selection and word >>> wrapping to all languages in use, not just the primary language. >> >> It's not clear to me that we need language information to apply proper >> font selection and word wrapping, since CSS doesn't do it. >> >> >>> When both English subtitles and Japanese captions are on screen, it >>> would be very bad to choose a Chinese font for the Japanese text, and >>> worse to choose a Western font and use it for everything, even if >>> English is the predominant language in the file. >> >> Can't you get around this using explicit styles, e.g. against classes? >> Unless this really is going to be a common problem, I'm not particularly >> concerned about it. >> >> >> On Mon, 24 Jan 2011, Philip J盲genstedt wrote: >>> >>> Multi-languaged subtitles/captions seem to be extremely uncommon, >>> unsurprisingly, since you have to understand all the languages to be >>> able to read them. >>> >>> The case you mention isn't a problem, you just specify Japanese as the >>> main language. >> >> Indeed. >> >> >>> There are a few other theoretical cases: >>> >>> * Multi-language CJK captions. I've never seen this, but outside of >>> captioning, it seems like the foreign script is usually transcribed to >>> the native script (e.g. writing Japanese names with simplified Chinese >>> characters). >>> >>> * Use of Japanese or Chinese words in a mostly non-CJK subtitles. This >>> would make correct glyph selection impossible, but I've never seen it. >>> >>> * Voice synthesis of e.g. mixed English/French captions. Given that this >>> would only be useful to be people who know both languages, it seem not >>> worth complicating the format for. >> >> Agreed on all fronts. >> >> >>> Do you have any examples of real-world subtitles/captions that would >>> benefit from more fine-grained language information? >> >> This kind of information would indeed be useful. >> >> >> On Mon, 24 Jan 2011, Glenn Maynard wrote: >>> >>> They're very common in anime fansubs: >>> >>> http://img339.imageshack.us/img339/2681/screenshotgg.jpg >>> >>> The text on the left is a transcription, the top is a transliteration, >>> and the bottom is a translation. >> >> Aren't these three separate text tracks? >> >> >>> I'm pretty sure I've also seen cases of translation notes mixing >>> languages within the same caption, eg. "jinja (绁炵ぞ): shrine", but >>> it's less common and I don't have an example handy. >> >> Mixing one CJK language with one non-CJK language seems fine. That should >> always work, assuming you specify good fonts in the CSS. >> >> >>> > The case you mention isn't a problem, you just specify Japanese as the >>> > main language. There are a few other theoretical cases: >>> >>> Then you're indicating that English text is Japanese, which I'd expect >>> to cause UAs to render everything with a Japanese font. That's what >>> happens when I load English text in Firefox and force SJIS: everything >>> is rendered in MS PGothic. That's probably just what Japanese users >>> want for English text mixed in with Japanese text, too--but it's >>> generally not what English users want with the reverse. >> >> I don't understand why we can't have good typography for CJK and non-CJK >> together. Surely there are fonts that get both right? >> >> >> On Mon, 24 Jan 2011, Glenn Maynard wrote: >>> > >>> > [ use multiple tracks ] >>> >>> Personally I'd prefer that, but it would require a good deal of metadata >>> support--marking which tracks are meant to be used together, tagging >>> auxilliary track types so browsers can choose (eg. an "English subtitles >>> with no song caption tracks" option), and so on. I'm sure that's a >>> non-starter (and I'd agree). >> >> It's not that much metadata. It's far less effort than making the >> subtitles in the first place. >> >> >>> I don't think you should need to resort to fine-grained font control to get >>> reasonable default fonts. >> >> I agree entirely, but I don't think you should need to resort to >> fine-grained language tagging either... >> >> >>> The above--semantics vs. presentation--brings something else to mind. >>> One of the harder things to subtitle well is when you have two >>> conversations talking on top of each other. This is generally done by >>> choosing a vertical spot for each conversation (generally augmented with >>> a color), so the viewer can easily follow one or the other. Setting the >>> line position *sort of* lets you do this, but that's hard to get right, >>> since you don't know how far apart to put them. You'd have to err >>> towards putting them too far apart (guessing the maximum number of lines >>> text might be wrapped to, and covering up much more of the screen than >>> usually needed), or putting one set on the top of the screen (making it >>> completely impossible to read both at once, rather than just >>> challenging). >>> >>> If I remember correctly, SSA files do this with a hack: wherever there's >>> a blank spot in one or the other conversation, a transparent dummy cue >>> is added to keep the other conversation in the correct relative spot, so >>> the two conversations don't swap places. >>> >>> I mention this because it comes to mind as something well-authored, >>> well-rendered subtitles need to get right, and I'm curious if there's a >>> reliable way to do this currently with WebVTT. If this isn't handled, >>> some scenes just fall apart. >> >> It's intended to be done using the L: feature to pick the lines. If the >> cues have more line wrapping than the author expected, it'll break. The >> only way around that would be to go through the whole file (or at least, >> the whole scene, somehow marked up as such) pre-rendering each cue to work >> out what the maximum line heights would be and then using that offset for >> each cue, etc, but that seems like a whole lot of complexity for a minor >> use case. Is line wrapping really going to be that unpredictable? >> >> >> On Mon, 24 Jan 2011, Philip J盲genstedt wrote: >>> >>> My main point here is that the use cases are so marginal. If there were >>> more compelling ones, it's not hard to support intra-cue language >>> settings using syntax like <lang en>bla</lang> or similar. >> >> Indeed. >> >> >> On Mon, 24 Jan 2011, Glenn Maynard wrote: >>> >>> Here's one that I think was done very well, rendered statically to make >>> sure we're all seeing the same thing: >>> >>> http://zewt.org/~glenn/multiple%20conversation%20example.mpg >>> >>> The results are pretty straightforward. One always stays on top, one >>> always stays on the bottom, and most of the time the spacing between the >>> two is correct--the normal distance the UA uses between two vertical >>> captions (which would be lost by specifying the line height explicitly). >>> Combined with the separate coloring (which is already possible, of >>> course), it's possible to read both conversations and intuitively track >>> which is which, and it's also very easy to just pick one or the other to >>> read. >> >> As far as I can tell, the WebVTT algorithm would handle this case pretty >> well. >> >> >>> One example of how this can be tricky: at 0:17, a caption on the bottom >>> wraps and takes two lines, which then pushes the line at 0:19 upward >>> (that part's simple enough). If instead the top part had appeared >>> first, the renderer would need to figure out in advance to push it >>> upwards, to make space for the two-line caption underneith it. >>> Otherwise, the captions would be forced to switch places. >> >> Right, without lookahead I don't know how you'd solve it. With lookahead >> things get pretty dicey pretty quickly. >> >> >> On Mon, 24 Jan 2011, Tab Atkins Jr. wrote: >>> >>> Right now, the WebVTT spec handles this by writing the text in white on >>> top of a partially-transparent black background. The text thus never >>> has contrast troubles, at the cost of a dark block covering up part of >>> the display. >>> >>> Stroking text is easy, though. Webkit has an experimental property for >>> doing it directly. Using existing CSS, it's easy to adapt text-shadow >>> to produce a good outline - just make four shadows, offset by 1px in >>> each direction, and you're good. >> >> WebVTT allows both text-shadow and text-outline. >> >> >> On Wed, 9 Feb 2011, Silvia Pfeiffer wrote: >>> >>> We're trying to avoid the need for multiple transcodings and are trying >>> to achieve something like the following pipeline: broadcast captions -> >>> transcode to WebVTT -> show in browser -> transcode to broadcast devices >>> -> show >> >> Why not just do: >> >> broadcast captions -> transcode to WebVTT -> show in browser >> >> ...for browsers and: >> >> broadcast captions -> show >> >> ...for legacy broadcast devices? >> >> >> In any case the amount of legacy broadcast captions pales in comparison to >> the volume of new captions we will see for the Web. I'm not really >> convinced that legacy broadcast captions are an important concern here. >> >> >>> What is the argument against using <u> in captions? >> >> What is the argument _for_ using <u> in captions? We don't add features >> due to a lack of reasons not to. We add features due to a plethora of >> reasons to do so. >> >> >>> > [ foolip suggests using multiple cues to do blinking ] >>> >>> But from a captioning/subtitling point of view it's probably hard to >>> convert that back to blinking text, since we've just lost the semantic >>> by ripping it into multiple cues (and every program would use different >>> ways of doing this). >> >> I do not think round-tripping legacy broadcast captions through WebVTT is >> an important use case. If that is something that we should support, then >> we should first establish why it is an important use case, and then >> reconsider WebVTT within that context, rather than adding features to >> handle it piecemeal. >> >> >>> I guess what we are discovering is that we can define the general format >>> of WebVTT for the Web, but that there may be an additional need to >>> provide minimum implementation needs (a "profile" if you want - as much >>> as I hate this word). >> >> Personally I have nothing against the word "profile", but I do have >> something against providing for "minimum implemenatation needs". >> >> Interoperability means everything works the same everywhere. >> >> >>> [re versioning the file format] >>> In a contract between a caption provider and a caption consumer (I am >>> talking about companies here), the caption consumer will want to tell >>> the caption provider what kind of features they expect the caption files >>> to contain and features they want avoided. This links back to the >>> earlier identified need for "profiles". This is actually probably >>> something outside the scope of this group, but I am sure there is a need >>> for such a feature, in particular if we want to keep the development of >>> the WebVTT specification open for future extensions. >> >> I don't see why there would be a need for anything beyond "make sure it >> works with deployed software", maybe with that being explicitly translated >> to specific features and workarounds for known bugs, e.g. "you can use >> ruby, but make sure you don't have timestamps out of order". >> >> This, however, has no correlation to versions of the format. >> >> >> On Mon, 14 Feb 2011, Philip J盲genstedt wrote: >>> > >>> > [line wrapping] >>> >>> There's still plenty of room for improvements in line wrapping, though. >>> It seems to me that the main reason that people line wrap captions >>> manually is to avoid getting two lines of very different length, as that >>> looks quite unbalanced. There's no way to make that happen with CSS, and >>> AFAIK it's not done by the WebVTT rendering spec either. >> >> WebVTT just defers to CSS for this. I agree that it would be nice for CSS >> to allow UAs to do more clever things here and (more importantly) for UAs >> to actually do more clever things here. >> >> >> On Tue, 15 Feb 2011, Silvia Pfeiffer wrote: >>> foolip wrote: >>> > >>> > Sure, it's already handled by the current parsing spec, since it >>> > ignores everything up to the first blank line. >>> >>> That's not quite how I'm reading the spec. >>> >>> http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#webvtt-0 >>> allows >>> "Optionally, either a U+0020 SPACE character or a U+0009 CHARACTER >>> TABULATION (tab) character followed by any number of characters that >>> are not U+000A LINE FEED (LF) or U+000D CARRIAGE RETURN (CR) >>> characters." >>> after the "WEBVTT FILE" magic. >>> To me that reads like all of the extra stuff has to be on the same line. >>> I'd prefer if this read "any character except for two WebVTT line >>> terminators", then it would all be ready for such header-style >>> metadata. >> >> That's the syntax rules. It's not the parser. >> >> >>> I'm told <u> is fairly common in traditional captions. >> >> I've never seen it. Do you have any data on this? >> >> >>> > Personally, I think we're going to see more and more devices running >>> > full browsers with webfonts support, and that this isn't going to be a >>> > big problem. >>> >>> I tend to agree and in fact I see that as the shiny future. Just not >>> quite yet. >> >> We're not quite at WebVTT yet either. Currently, there's more support for >> WebFonts than WebVTT. >> >> >> On Tue, 15 Feb 2011, Glenn Maynard wrote: >>> >>> I think that, no matter what you do, people will insert line breaks in >>> cues. I'd follow the HTML model here: convert newlines to spaces and >>> have a separate, explicit line break like <br> if needed, so people >>> don't manually line-break unless they actually mean to. >> >> The line-breaks-are-line-breaks feature is one of the features that >> originally made SRT seem like a good idea. It still seems like the neatest >> way of having a line break. >> >> >>> Related to line breaking, should there be an escape? Inserting >>> nbsp literally into files is somewhat annoying for authoring, since >>> they're indistinguishable from regular spaces. >> >> How common would be? >> >> >> On Thu, 10 Feb 2011, Silvia Pfeiffer wrote: >>> >>> Further discussions at Google indicate that it would be nice to make >>> more components optional. Can we have something like this: >>> >>> [[h*:]mm:]ss[.[d[c[m]]] | s*[.d[c[m]]] >>> >>> Examples: >>> 23 = 23 seconds >>> 23.2 = 23 sec, 1 decisec >>> 1:23.45 = 1 min, 23 sec, 45 centisec >>> 123.456 = 123 sec, 456 millisec >> >> Currently the syntax is [h*:]mm:ss.sss; what's the advantage of making >> this more complicated? It's not like most subtitled clips will be shorter >> than a minute. Also, why would we want to support multiple redundant ways >> of expressing the same time? (e.g. 01:00.000 and 60.000) >> >> Readability of VTT files seems like it would be helped by consistency, >> which suggests using the same format everywhere, as much as possible. >> >> >> On Sun, 16 Jan 2011, Mark Watson wrote: >>> >>> I have been looking at how the video element might work in an adaptive >>> streaming context where the available media are specified with some kind >>> of manifest file (e.g. MPEG DASH Media Presentation Description) rather >>> than in HTML. >>> >>> In this context there may be choices available as to what to present, >>> many but not all related to accessibility: >>> >>> - multiple audio languages >>> - text tracks in multiple languages >>> - audio description of video >>> - video with open captions (in various languages) >>> - video with sign language >>> - audio with directors commentary >>> - etc. >>> >>> It seems natural that for text tracks, loading the manifest could cause >>> the video element to be populated with associated <track> elements, >>> allowing the application to discover the choices and activate/deactivate >>> the tracks. >> >> Not literal <track> elements, hopefully, but in-band text tracks (known as >> "media-resource-specific text track" in the spec). >> >> >>> But this seems just for text tracks. I know discussions are underway on >>> what to do for other media types, but my question is whether it would be >>> better to have a consistent solution for selection amongst the available >>> media that applies for all media types ? >> >> They're pretty different from each other, so I don't know that one >> solution would make sense for all of these. >> >> Does the current solution (the videoTracks, audioTracks, and textTracks >> attributes) adequately address your concern? >> >> >> On Mon, 17 Jan 2011, Jeroen Wijering wrote: >>> >>> We are getting some questions from JW Player users that HTML5 video is >>> quite wasteful on bandwidth for longer videos (think 10min+). This >>> because browsers download the entire movie once playback starts, >>> regardless of whether a user pauses the player. If throttling is used, >>> it seems very conservative, which means a lot of unwatched video is in >>> the buffer when a user unloads a video. >>> >>> I did a simple test with a 10 minute video: playing it; pausing after 30 >>> seconds and checking download progress after another 30 seconds. With >>> all browsers (Firefox 4, Safari 5, Chrome 8, Opera 11, iOS 4.2), the >>> video would indeed be fully downloaded after 60 seconds. Some throttling >>> seems to be applied by Safari / iOS, but this could also be bandwidth >>> fluctuations on my side. Either way, all browsers downloaded the 10min >>> video while only 30 seconds were being watched. >>> >>> The HTML5 spec is a bit generic on this topic, allowing mechanisms such >>> as stalling and throttling but not requiring them, or prescribing a >>> scripting interface: >>> >>> http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#concept-media-load-resource >> >> Right, this is an area that is left up to implementations; a quality of >> implementation issue. >> >> >>> A suggestion would be to implement / expose a property called >>> "downloadBufferTarget". It would be the amount of video in seconds the >>> browser tries to keep in the download buffer. >> >> Wouldn't this be very situation-specific? e.g. if I know I'm about to go >> into a tunnel for five minutes, I want five minutes of buffered data. If >> my connection has a high packet loss rate and could stall for upwards of >> 10 seconds, I want way more than 10 seconds in my buffer. If my connection >> is such that I can't download data in realtime, I want the whole video in >> my buffer. If my connection is such that I have 8ms latency to the video >> server and enough bandwidth to transfer the whole four hour file in 3 >> seconds, then really I don't need anything in my buffer. >> >> >> On Mon, 17 Jan 2011, Roger H錱ensen wrote: >>> On 2011-01-17 18:36, Markus Ernst wrote: >>> > >>> > Could this be done at the user side, e.g. with some browser setting? >>> > Or even by a "stop downloading" control in the player? An intuitive >>> > user control would be separate stop and pause buttons, as we know them >>> > from tape and CD players. Pause would then behave as it does now, >>> > while stop would cancel downloading. >>> >>> I think that's the right way to do it, this should be in the hands of >>> the user and exposed as a preference in the browsers. >> >> Agreed. >> >> >>> Although exposing (read only?) the user's preferred buffer setting to >>> the HTML App/Plugin etc. would be a benefit I guess as the desired >>> buffering could be communicated back to the streaming server for example >>> for a better bandwidth utilization. >> >> How would the information be used? >> >> >> On Mon, 17 Jan 2011, Zachary Ozer wrote: >>> >>> What no one has mentioned so far is that the real issue isn't the >>> network utilization or the memory capacity of the devices, it's >>> bandwidth cost. >>> >>> The big issue for publishers is that they're incurring higher costs when >>> using the <video> tag, which is a disincentive for adoption. >>> >>> Since there are situations where both the publisher and the user are >>> potentially incurring bandwidth costs (or have other limitations), we >>> could allow the publisher to specify downloadBufferTarget and the user >>> to specify a setting in the browser's config. The browser would then >>> actually buffer min(user setting, downloadBufferTarget). At that point >>> there would probably need to be another read-only property that >>> specified what value the browser is currently using as it's buffer >>> length, but maybe the getter for downloadBufferTarget is sufficient. >> >> I think before we get something that elaborate set up, we should just try >> getting preload="" implemented. :-) That might be sufficent. >> >> >> On Tue, 18 Jan 2011, Robert O'Callahan wrote: >>> >>> One solution that could work here is to honour dynamic changes to >>> 'preload', so switching preload to 'none' would stop buffering. Then a >>> script could do that, for example, after the user has paused the video >>> for ten seconds. The script could also look at 'buffered' to make its >>> decision. >> >> If browsers want to do that I'm quite happy to add something explicitly to >> that effect to the spec. Right now the spec doesn't disallow it. >> >> >> On Wed, 19 Jan 2011, Philip J盲genstedt wrote: >>> >>> The only difference between preload=none and preload=metadata is how >>> much is fetched if the user doesn't interact at all with the video. Once >>> the user has begun playing, I think the two mean the same thing: "please >>> don't waste my bandwidth more than necessary". In other words, I think >>> that for preload=metadata, browsers should be somewhat conservative even >>> after playback has begun, not going all the way to the preload=auto >>> behavior. >> >> The descriptions are somewhat loose, but something like this could work, >> yes. (Though I'd say after playing preload=metadata and preload=auto are >> the same and preload=none is the one that says to avoid bandwidth usage, >> but that's just an artifact of the way I wrote the descriptions.) >> >> >> On Tue, 18 Jan 2011, Zachary Ozer wrote: >>> >>> Currently, there's no way to stop / limit the browser from buffering - >>> once you hit play, you start downloading and don't stop until the >>> resource is completely loaded. This is largely the same as Flash, save >>> the fact that some browsers don't respect the preload attribute. (Side >>> note: I also haven't found a browser that stops loading the resource >>> even if you destroy the video tag.) >>> >>> There have been a few suggestions for how to deal with this, but most >>> have revolved around using downloadBufferTarget - a settable property >>> that determines how much video to buffer ahead in seconds. Originally, >>> it was suggested that the content producers should have control over >>> this, but most seem to favor the client retaining some control since >>> they are the most likely to be in low bandwidth situations. (Publishers >>> who want strict bandwidth control could use a more advanced server and >>> communication layer ala YouTube). >>> >>> The simplest enhancement would be to honor the downloadBufferTarget only >>> when readyState=HAVE_ENOUGH_DATA and playback is paused, as this would >>> imply that there is not a low bandwidth situation. >> >> It seems the simplest enhancement would be to have the browsers do the >> right thing (e.g. download enough to get to HAVE_ENOUGH_DATA and stop if >> the video is paused, or some such), not to add a feature that all Web >> authors would have to handle. >> >> >> On Tue, 18 Jan 2011, Boris Zbarsky wrote: >>> >>> In general, depending on finalizers to release resources (which is >>> what's happening here) is not really a workable setup. Maybe we need an >>> api to explicitly release the data on an audio/video tag? >> >> The spec suggests removing the element's src="" attribute and <source> >> elements and then calling the element's load() method. >> >> The spec also suggests that implementors release all resources used by a >> media element when that media element is an orphan when the event loop >> spins. >> >> See the "Best practices for authors using media elements" and "Best >> practices for implementors of media elements" sections. >> >> >> On Wed, 19 Jan 2011, Andy Berkheimer wrote: >>> >>> In the case where the viewer does not have enough bandwidth to stream >>> the video in realtime, there are two basic options for the experience: >>> - buffer the majority of the video (per Glenn and Boris' discussion) >>> - switch to a lower bitrate that can be streamed in realtime >>> >>> This thread has focused primarily of the first option and this is an >>> experience that we see quite a bit. This is the option favored amongst >>> enthusiasts and power users, and also makes sense when a viewer has made >>> a purchase with an expectation of quality. And there's always the >>> possibility that the user does not have enough bandwidth for even the >>> lowest available bitrate. >>> >>> But the second option is the experience that the majority of our viewers >>> expect. >>> >>> The ideal interface would have a reasonable default behavior but give an >>> application the ability to implement either experience depending on user >>> preference (or lack thereof), viewing context, etc. >> >> Agreed. This is the kind of thing that a good streaming protocol can >> negotiate in realtime. >> >> >>> I believe Chrome's current implementation _does_ stall the HTTP >>> connection (stop reading from the socket interface but keep it open) >>> after some amount of readahead - a magic hardcoded constant. We've run >>> into issues there - their browser readahead buffer is too small and >>> causing a lot of underruns. >> >> It's early days. File bugs! >> >> >>> No matter how much data you pass between client and server, there's >>> always some useful playback state that the client knows and the server >>> does not - or the server's view of the state is stale. This is >>> particularly true if there's an HTTP proxy between the user agent and >>> the server. Any behavior that could be implemented through an advanced >>> server/communication layer can be achieved in a simpler, more robust >>> fashion with a solid buffer management implementation that provides >>> "advanced" control through javascript and attributes. >> >> The main difference is that a protocol will typically be implemented a few >> times by experienced programmers writing servers and clients, which will >> then be deployed and used by less experienced (in this kind of thing) Web >> developers, while if we just expose it to JavaScript, the people >> implementing it will be a combination of experienced library authors and >> those same Web developers, and the result will likely be less successful. >> >> However, the two aren't mutually exclusive. We could do one and then later >> (or at the same time) do the other. >> >> >> On Tue, 18 Jan 2011, Roger H氓gensen wrote: >>> >>> It may sound odd but in low storage space situations, it may be >>> necessary to unbuffer what has been played. Is this supported at all >>> currently? >> >> Yes. >> >> >>> I think that the buffering should basically be a "moving window" (I hope >>> most here are familiar with this term?), and that the size of the moving >>> window should be determined by storage space and bandwidth and browser >>> preference and server preference, plus make sure the window supports >>> skipping anywhere without needing to buffer up to it, and avoid >>> buffering from the start just because the user skipped back a little to >>> catch something they missed (another annoyance). This is the only >>> logical way to do this really. Especially since HTTP 1.1 has byterange >>> support there is nothing preventing it from being implemented, and I >>> assume other popular streaming protocols supports byterange as well? >> >> Implementations are allowed to do that. >> >> >> On Tue, 18 Jan 2011, Silvia Pfeiffer wrote: >>> >>> I think that's indeed one obvious improvement, i.e. when going to pause >>> stat, stop buffering when readyState=HAVE_ENOUGH_DATA (i.e. we have >>> reached canplaythrough state). >> >> The spec allows this already. >> >> >>> However, again, I don't think that's sufficient. Because we will also >>> buffer during playback and it is possible that we buffer fast enough to >>> have buffered e.g. the whole of a 10min video by the time we hit pause >>> after 1 min and stop watching. That's far beyond canplaythrough and >>> that's 9min worth of video download wasted bandwidth. This is where the >>> suggested downloadBufferTarget would make sense. It would basically >>> specify how much more to download beyond HAVE_ENOUGH_DATA before pausing >>> the download. >> >> I don't understand how a site can know what the right value is for this. >> Users aren't going to understand that they have to control the buffering >> if (e.g.) they're about to go into a tunnel and they want to make sure >> it's buffered all the way through. It should just work, IMHO. >> >> >> On Tue, 18 Jan 2011, David Singer wrote: >>> >>> If you want a more tightly coupled supply/consume protocol, then use >>> one. As long as it's implemented by client and server, you're on. >>> >>> Note that the current move of the web towards download in general and >>> HTTP in particular is due in no small part to the fact that getting more >>> tightly coupled protocols -- actually, any protocol other than HTTP -- >>> out of content servers, across firewalls, through NATs, and into clients >>> is...still a nightmare. So, we've been given a strong incentive by all >>> those to use HTTP. It's sad that some of them are not happy with that >>> result, but it's going to be hard to change now. >> >> Agreed, though in practice there are certainly ways to get two-way >> protocols through. WebSocket does a pretty good job, for example. But >> designing a protocol for this is out of scope for this list, really. >> >> >> On Tue, 18 Jan 2011, David Singer wrote: >>> >>> In RTSP-controlled RTP, there is a tight relationship between the play >>> point, and play state, the protocol state (delivering data or paused) >>> and the data delivered (it is delivered in precisely real-time, and >>> played and discarded shortly after playing). The server delivers very >>> little more data than is actually watched. >>> >>> In HTTP, however, the entire resource is offered to the client, and >>> there is no protocol to convey play/paused back to the server, and the >>> typical behavior when offered a resource in HTTP is to make a simple >>> binary decision to either load it (all) or not load it (at all). So, by >>> providing a media resource over HTTP, the server should kinda be >>> expecting this 'download' behavior. >>> >>> Not only that, but if my client downloads as much as possible as soon as >>> possible and caches as much as possible, and yours downloads as little >>> as possible as late as possible, you may get brownie points from the >>> server owner, but I get brownie points from my local user -- the person >>> I want to please if I am a browser vendor. There is every incentive to >>> be resilient and 'burn' bandwidth to achieve a better user experience. >>> >>> Servers are at liberty to apply a 'throttle' to the supply, of course >>> ("download as fast as you like at first, but after a while I'll only >>> supply at roughly the media rate"). They can suggest that the client be >>> a little less aggressive in buffering, but it's easily ignored and the >>> incentive is to ignore it. >>> >>> So I tend to return to "if you want more tightly-coupled behavior, use a >>> more tightly-coupled protocol"... >> >> Indeed. >> >> >> On Wed, 19 Jan 2011, Philip J盲genstedt wrote: >>> >>> The 3 preload states imply 3 simple buffering strategies: >>> >>> none: don't touch the network at all >>> preload: buffer as little as possible while still reaching readyState >>> HAVE_METADATA >>> auto: buffer as fast and much as possible >> >> "auto" isn't "as fast and much as possible", it's "as fast and much as >> will make the user happy". In some configurations, it might be the same as >> "none" (e.g. if the user is paying by the byte and hates video). >> >> >>> However, the state we're discussing is when the user has begun playing the >>> video. The spec doesn't talk about it, but I call it: >>> >>> invoked: buffer as little as possible without readyState dropping below >>> HAVE_FUTURE_DATA (in other words: being able to play from currentTime to >>> duration at playbackRate without waiting for the network) >> >> There's also a fifth state, let's call it "aggressive", where even while >> playing the video the UA is trying to download the whole thing in case the >> connection drops. >> >> >>> If the available bandwidth exceeds the bandwidth of the resource, some >>> kind of throttling must eventually be used. There are mainly 2 options >>> for doing this: >>> >>> 1. Throttle at the TCP level by not reading data from the socket (not at all >>> to suspend, or at a controlled rate to buffer ahead) >>> 2. Use HTTP byte ranges, making many smaller requests with any kind of >>> throttling at the TCP level >> >> There's also option 3, to handle the fifth state above: don't throttle. >> >> >>> When HTTP byte ranges are used to achieve bandwidth management, it's >>> hard to talk about a single downloadBufferTarget that is the number of >>> seconds buffered ahead. Rather, there might be an upper and lower limit >>> within which the browser tries to stay, so that each request can be of a >>> reasonable size. Neither an author-provided minumum or maximum value can >>> be followed particularly closely, but could possibly be taken as a hint >>> of some sort. >> >> Would it be a more useful hint than "preload"? I'm skeptical about adding >> many hints with no requirements. If there's some specific further >> information we can add, though, it might make sense to add more features >> to "preload". >> >> >>> The above buffering strategies are still not enough, because users seem >>> to expect that in a low-bandwidth situation, the video will keep >>> buffering until they can watch it through to the end. These seem to be >>> the options for solving the problem: >>> >>> * Make sites that want this behavior set .preload='auto' in the 'paused' >>> event handler >>> >>> * Add an option in the context menu to "Preload Video" or some such >>> >>> * Cause an invoked (see dfn above) but paused video to behave like >>> preload=auto >>> >>> * As above, but only when the available bandwidth is limited >>> >>> I don't think any of these solutions are particularly good, so any input >>> on other options is very welcome! >> >> If users expect something, it seems logical that it should just happen. I >> don't have a problem with saying that it should depend on preload="", >> though. If you like I can make the spec explicitly describe what the >> preload="" hints mean while video is playing, too. >> >> >> On Wed, 19 Jan 2011, Zachary Ozer wrote: >>> >>> What if, instead of trying to solve this problem, we leave it up to the >>> publishers. The current behavior would be unchanged, but we could add >>> explicit bandwidth management API calls, ie startBuffer() and >>> stopBuffer(). This would let developers / site publishers control how >>> much to buffer and when. >> >> We couldn't depend on it (most people presumably won't want to do anything >> but give the src="" of their video). >> >> >>> We might also consider leaning on users a bit to tell us what they want. >>> For example, I think people are pretty used to hitting play and then >>> pause to buffer until the end of the video. What if we just used our >>> bandwidth heuristics while in the play state, and buffered blindly when >>> a pause occurs less than X seconds into a video? I won't argue that this >>> is a wonderful solution (or a habit we should encourage), but I figured >>> I'd throw a random idea out there鈥� >> That seems like pretty ugly UI. :-) >> >> >> On Thu, 20 Jan 2011, Glenn Maynard wrote: >>> >>> I think that pausing shouldn't affect read-ahead buffering behavior. >>> I'd suggest another preload value, preload=buffer, sitting between >>> "metadata" and "auto". In addition to everything loaded by "metadata", >>> it also fills the read-ahead buffer (whether the video is playing or >>> not). >>> >>> - If a page wants prebuffering only (not full preloading), it sets >>> preload=buffer. This can be done even when the video is paused, so when >>> the user presses play, the video starts instantly without pausing for a >>> server round-trip like preload=metadata. >> >> So this would be to buffer enough to play through assuming the network >> remains at the current bandwidth, but no more? >> >> >>> - If a page wants prebuffering while playing, but unlimited buffering when >>> paused (per Zachary's suggestion), it sets preload=buffer when playing and >>> preload=auto when paused. >> >> Again, note that "auto" doesn't mean "buffer everything", it means "do >> whatever is best for the user". >> >> I don't mind adding new values if the browser vendors are going to use >> them. >> >> >> On Sat, 22 Jan 2011, David Singer wrote: >>> >>> When the HTML5 states were first proposed, I went through a careful >>> exercise to make sure that they were reasonably delivery-technology >>> neutral, i.e. that they applied equally well if say RTSP/RTP was used, >>> some kind of dynamic streaming, simple HTTP, and so on. >>> >>> I am concerned that we all tend to assume that HTML==HTTP, but the >>> source URL for the media might have any protocol type, and the HTML >>> attributes, states etc. should apply (or clearly not apply) to anything. >>> >>> Assuming only HTTP, in the markup, is not a good direction. >> >> Agreed. >> >> >> On Thu, 20 Jan 2011, Matthew Gregan wrote: >>> >>> The media seek algorithm (4.8.10.9) states that the current playback >>> position should be set to the new playback position during the >>> asynchronous part of the algorithm, just before the seeking event is >>> fired. [...] >> >> On Thu, 20 Jan 2011, Philip J盲genstedt wrote: >>> >>> There have been two non-trivial changes to the seeking algorithm in the >>> last year: >>> >>> Discussed at http://lists.w3.org/Archives/Public/public-html/2010Feb/0003.html >>> lead to http://html5.org/r/4868 >>> >>> Discussed at http://lists.w3.org/Archives/Public/public-html/2010Jul/0217.html >>> lead to http://html5.org/r/5219 >> >> Yeah. In particular, sometimes there's no way for the UA to know >> asynchronously if the seek can be done, which is why the attribute is set >> after the method returns. It's not ideal, but the alternative is not >> always implementable. >> >> >>> With that said, it seems like there's nothing that guarantees that the >>> asynchronous section doesn't start running while the script is still >>> running. >> >> Yeah. It's not ideal, but I don't really see what we can do about it. >> >> >>> It's also odd that currentTime is updated before the seek has actually >>> been completed, but the reason for this is that the UI should show the >>> new position. >> >> Not just the UI. The current position is what the browser is trying to >> play; if the current position didn't move, then the browser wouldn't be >> trying to play it. >> >> >> On Fri, 4 Feb 2011, Matthew Gregan wrote: >>> >>> For anyone following along, the behaviour has now been changed in the >>> Firefox 4 nightly builds. >> >> On Mon, 24 Jan 2011, Robert O'Callahan wrote: >>> >>> I agree. I think we should change behavior to match author expectations >>> and the other implementations, and let the spec change to match. >> >> How do you handle the cases where it's not possible? >> >> >> If all the browsers can do it, I'm all for going back to having >> currentTime change synchronosuly. >> >> >> On Sat, 29 Jan 2011, Lubomir Toshev wrote: >>> >>> [W]hen the video tag has embedded browser controls displayed and I click >>> anywhere on the controls, they cause a video tag click event. If I want >>> to toggle play/pause on video area click, then I cannot do this, because >>> clicking on the play control button, fires play, then click event fires >>> for video tag and when I toggle It pauses. So this behavior that every >>> popular flash player has cannot be achieved. There is no way to >>> understand that the click.target is the embedded browser controls area. >>> I think that a nice improvement will be to expose this information, in >>> the target, that it actually is embedded browser controls. Or clicking >>> the embedded browser controls should not produce a click event for video >>> tag. After all browser controls are native and do not have >>> representation in the DOM. Let me know what do you think about this? >> >> On Sat, 29 Jan 2011, Aryeh Gregor wrote: >>> >>> Well, to begin with, you could just use your own controls rather than >>> the browser's built-in controls. Then you have no problem. If you're >>> using the browser's built-in controls, maybe you should stick with the >>> browser's control conventions throughout, which presumably doesn't >>> include toggling play/pause on click. >>> >>> I'm not sure this is a broad enough problem to warrant exposing the >>> extra information in the target. Are there any other use-cases for such >>> info? >> >> On Sun, 30 Jan 2011, Lubomir Toshev wrote: >>> >>> To elaborate a bit, I'm a control developer and I have my own custom >>> controls. But we want to allow for the customer to use the default >>> browser controls if they want to. This can be done by switching an >>> option in my jQuery widget - browserControls - true/false. Or through >>> browser context menu shown by default on right click. So I'm trying to >>> be flexible enough for the customer. >>> >>> I was thinking about this >>> 1) that adding a transparent overlay over the browser controls >>> Or >>> 2) to detect the click position and if it is some pixels away from the >>> bottom of the video tag >>> >>> will fix this, but every browser has different height for its embedded >>> controls and I should hardcode this height in my code, which is just not >>> manageable. >>> >>> I can always add a limitation when using browser controls, toggle >>> play/pause on video area click will be turned off, but I want to achieve >>> similar behavior in all the browsers no matter whether they use embedded >>> controls or not. >>> >>> So I think this tiny click.target thing will be very useful. >> >> On Sun, 30 Jan 2011, Glenn Maynard wrote: >>> >>> Even as a bad hack it's simply not possible; for example, there's no way >>> to tell whether a pop-out volume control is open or not. >>> >>> I think the primary use case browser controls are meant for is when >>> scripting isn't available at all. They aren't very useful when you're >>> using any kind of scripts with the video. Another problem, related to >>> your other post about captioning, is that it's impossible to put >>> anything between the video and the controls, so your captions will draw >>> *on top of* browser controls. >> >> On Mon, 31 Jan 2011, Simon Pieters wrote: >>> >>> See http://lists.w3.org/Archives/Public/public-html/2009Jun/0395.html >>> >>> I suggested that the browser would not generate an event at all when >>> using the native controls. Seemingly there was no reply to Hixie's >>> request for opinion from other implementors. >> >> On Mon, 31 Jan 2011, Glenn Maynard wrote: >>> >>> There are other meaningful ways to respond to these events; for example, >>> to pull its container to the top of the draw order if it's a floating >>> window. I should be able to capture mousedown on the container to do >>> this, regardless of content. >> >> On Mon, 31 Jan 2011, Simon Pieters wrote: >>> >>> How about just suppressing activation events like click? >> >> On Mon, 31 Jan 2011, Glenn Maynard wrote: >>> >>> That makes more sense than suppressing the entire mousedown/mouseup >>> events (and keydown, touchstart, etc). >>> >>> Also, it means you can completely emulate the event behavior of the >>> default browser controls with scripts: preventDefault on mousedown to >>> prevent click events. That's probably not what you actually want to do, >>> but it means the default controls aren't doing anything special: their >>> effect on events can be understood entirely in terms of what scripted >>> events can already do. >> >> On Mon, 31 Jan 2011, Lubomir Toshev wrote: >>> >>> I totally agree that events should not be raised, when they originate >>> from the native browser controls. This would make it much simpler. I >>> filed the same bug for Opera 11 last week. >> >> As with the post Simon cites above, I'm happy to do this kind of thing, if >> multiple vendors agree that it makes sense. If you would like this to be >> done, I recommend getting other browser vendors to tell me it sounds good! >> >> >> On Sat, 29 Jan 2011, Lubomir Toshev wrote: >>> >>> [V]ideo should expose API for currentFrame, so that when control >>> developers want to add support for subtitles on their own, to be able to >>> support formats that display the subtitles according to the current >>> video frame. This is a limitation to the current design of the video >>> tag. >> >> On Sun, 30 Jan 2011, Lubomir Toshev wrote: >>> >>> We were trying to add support for subtitles for our player control that >>> uses video tag as its base. There are two popular subtitle formats *.srt >>> which uses currentTime to show the subtitles where they should be. Like >>> 0:01:00 - 0:01:30 - "What a nice hotel." While the other popular format >>> is *.sub which uses the currentFrame to show the proper subtitles. Like >>> {45600}, {45689} - "What a nice hotel". And if I want to add this >>> support it would be good if video tag exposes currentFrame, so that I >>> can show properly the subtitles in a span positioned over the video. Now >>> does it make more sense? >>> >>> I know video will have embedded subtitle support, but I think that it >>> should be flexible enough to allow building such features like the one >>> above. What do you think? To me this is worth adding because, it should >>> be really easy to implement? >> >> We'll probably add that along with the metrics, when we add those, if >> there's a strong use case for it. I'm not sure that supporting frame-based >> subtitles is a good use case though. >> >> >> On Mon, 14 Feb 2011, David Flanagan wrote: >>> >>> The draft specification defines 20+ medial event handler IDL attributes >>> on HTMLElement. These events are non-bubbling and are always targeted >>> at <audio> and <video> tags, so I wonder if they wouldn't be better >>> defined on HTMLMediaElement instead. >> >> All event handlers are on HTMLElement, to make implementations easier and >> to make it the platform simpler. >> >> >> On Tue, 15 Feb 2011, David Flanagan wrote: >>> >>> Fair enough, though I do think it will confuse developers who will think >>> that those media events bubble. (I'll be documenting them as properties >>> of HTMLMediaElement). >> >> Whether an event bubbles or not is up to the place that dispatches the >> event, not the place that hears the event. >> >> >>> What about Document and Window? What's the justification for defining >>> the media event handler attributes on those objects? >> >> Same. It allows the same logic to be used everywhere. >> >> >> On Mon, 14 Feb 2011, Kevin Marks wrote: >>> On Mon, Feb 14, 2011 at 2:39 PM, Ian Hickson <ian@hixie.ch> wrote: >>> > On Fri, 19 Nov 2010, Per-Erik Brodin wrote: >>> > > >>> > > We are about to start implementing stream.record() and >>> > > StreamRecorder. The spec currently says that 鈥渢he file must be in >>> > > a format supported by the user agent for use in audio and video >>> > > elements鈥�which is a reasonable restriction. However, there is >>> > > currently no way to set the output format of the resulting File that >>> > > you get from recorder.stop(). It is unlikely that specifying a >>> > > default format would be sufficient if you in addition to container >>> > > formats and codecs consider resolution, color depth, frame rate etc. >>> > > for video and sample size and rate, number of channels etc. for >>> > > audio. >>> > > >>> > > Perhaps an argument should be added to record() that specifies the >>> > > output format from StreamRecorder as a MIME type with parameters? >>> > > Since record() should probably throw when an unsupported type is >>> > > supplied, it would perhaps be useful to have a canRecordType() or >>> > > similar to be able to test for supported formats. >>> > >>> > I haven't added anything here yet, mostly because I've no idea what to >>> > add. The ideal situation here is that we have one codec that everyone >>> > can read and write and so don't need anything, but that may be >>> > hopelessly optimistic. >>> >>> That isn't the ideal, as it locks us into the current state of the art >>> forever. The ideal is to enable multiple codecs +formats that can be >>> swapped out over time. That said, uncompressed audio is readily >>> codifiable, and we could pick a common file format, sample rate, >>> bitdepth and channel caount specification. >> >> It doesn't lock us in to one format, we can always add more formats later. >> Right now, we have zero formats, so one format would be a huge step up. >> >> >> On Fri, 4 Mar 2011, Philip J盲genstedt wrote: >>> On Thu, 03 Mar 2011 22:15:58 +0100, Aaron Colwell <acolwell@google.com> >>> wrote: >>> > >>> > I was looking at the resource fetch >>> > algorithm<http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#concept-media-load-resource>section >>> > and fetching resources >>> > <http://www.whatwg.org/specs/web-apps/current-work/multipage/urls.html#fetch> >>> > sections of the HTML5 spec to determine what the proper behavior is >>> > for handling redirects. Both YouTube and Vimeo do 302 redirects to >>> > different hostnames from the URLs specified in the src attribute. It >>> > looks like the spec says that playback should fail in these cases >>> > because they are from different origins (Section 2.7 Fetching >>> > resources bullet 7). This leads me to a few questions. >>> > >>> > 1. Is my interpretation of the spec correct? Sample YouTube & Vimeo URLs are >>> > shown below. >>> > YouTube : src : http://v22.lscache6.c.youtube.com/videoplayback? ... >>> > redirect : http://tc.v22.cache6.c.youtube.com/videoplayback? >>> > ... >>> > >>> > Vimeo : src : http://player.vimeo.com/play_redirect? ... >>> > redirect : http://av.vimeo.com/05 ... >>> >>> Yes, from what I can tell you're correct, but I think it's not >>> intentional. The behavior was changed by <http://html5.org/r/5111> in >>> 2010-06-25, and this is the first time I've noticed it. Opera (and I >>> assume most if not all other browsers) already supports HTTP redirects >>> for <video> and I don't think it makes much sense to disallow it. For >>> security purposes, the origin of the resource is considered to be the >>> final destination, not any of the origins in the redirect chain. >> >> This was fixed recently. >> >> >> On Fri, 18 Mar 2011, Eric Winkelman wrote: >>> >>> For in-band metadata tracks, there is neither a standard way to >>> represent the type of metadata in the HTMLTrackElement interface nor is >>> there a standard way to represent multiple different types of metadata >>> tracks. >> >> There can be a standard way. The idea is that all the types of metadata >> tracks that browsers will support should be specified so that all browsers >> can map them the same way. I'm happy to work with anyone interested in >> writing such a mapping spec, just let me know. >> >> >>> Proposal: >>> >>> For TimedTextTracks with kind=metadata the @label attribute should >>> contain a MIME type for the metadata and that a track only contain Cues >>> created from metadata of that MIME type. >>> >>> This implies that streams with multiple types of metadata require the >>> creation of multiple metadata track objects, one for each MIME type. >> >> This might make sense if we had a defined way of getting such a MIME type >> (and assuming you're talking about the IDL attributes, not the content >> attributes). >> >> >> On Tue, 22 Mar 2011, Eric Winkelman wrote: >>> >>> Ah, yes, now I understand the confusion. Within the whatwg specs, the >>> word "attribute" is generally used and I was trying to be consistent. >> >> The WHATWG specs refer to content attributes (those on elements) and IDL >> attributes (those on objects, which generate properties in JS). The @foo >> syntax is never used in the WHATWG specs. It's usually used in a W3C >> context just to refer to content attributes, by analogy to the XPath >> syntax. (Personally I prefer foo="" since it's less ambiguous.) >> >> >> On Mon, 21 Mar 2011, Eric Winkelman wrote: >>> >>> No, I'm not saying that, but as far as I can tell from the spec, it is >>> undefined how the user agent should map in-band data to metadata tracks. >>> I am proposing that the algorithm should be that different types of data >>> should go into different Timed Text Tracks, and that the track's @label >>> should reflect the type. >> >> To the extent that it is defined, it is defined here: >> >> http://www.whatwg.org/specs/web-apps/current-work/complete.html#sourcing-in-band-text-tracks >> >> But the theory, as mentioned above, is that specific types of in-band >> metadata tracks would have explicit specs written to define how the >> mapping is done. >> >> >>> Recent updates to the spec, section 4.8.10.12.2 >>> (http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#sourcing-in-band-text-tracks) >>> appear to address my concern in step 2: >>> >>> "2. Set the new text track's kind, label, and language based on the >>> semantics of the relevant data, as defined by the relevant >>> specification." >>> >>> Provided that the relevant specification defines the metadata type >>> encoding to be put in the label, e.g. application/x-eiss, >>> application/x-scte35, application/x-contentadvisory, etc. >> >> Well the problem is that there typically is no applicable specification, >> or that it is too vague. >> >> >> On Tue, 22 Mar 2011, Lachlan Hunt wrote: >>> >>> This is regarding the recently added audioTracks and videoTracks APIs to >>> the HTMLMediaElement. >>> >>> The design of these APIs seems to be done a little strangely, in that >>> dealing with each track is done by passing an index to each method on >>> the TrackList interfaces, rather than treating the audioTracks and >>> videoTracks as collections of individual audio/video track objects. This >>> design is inconsistent with the design of the TextTrack interface, and >>> seems sub-optimal. >> >> It is intended to avoid an explosion of objects. TextTrack needs to be an >> object because it has separate state, gets targetted for events, has >> different versions (e.g. MutableTextTrack), etc. Audio and Video tracks >> are, on the other hand, rather trivial constructs. >> >> >>> The use of ExclusiveTrackList for videoTracks also seems rather >>> limiting. What about cases where the second video track is a >>> sign-language track, or some other video overlay. >> >> You use a separate <video> element. >> >> I considered this in some depth. The main problem is that you end up >> having to define a layout mechanism for videos if you allow multiple >> videos to be enabled from script (e.g. consider what the behaviour should >> be if you enable the main video, then the PiP sign language video, then >> disable the main video. What is the intrinsic dimension of the <video> >> element? Does it matter if you do it in a different order?). >> >> By making <video> be a single video's output layer, we can bypass many of >> these problems without removing expressibility (the author can still >> support multiple PiP videos). >> >> >>> There are also the use cases for controlling the volume of individual >>> tracks that are not addressed by the current spec design. >> >> Can you elaborate on these use cases? >> >> My assumption has been that on the long term, i you want to manipulate >> specific audio tracks, you would use an <audio> element and plug it into >> the Audio API for separate processing. >> >> >> On Sat, 2 Apr 2011, Bruce Lawson wrote: >>> >>> From a comment in a blog post of mine about longdesc >>> (http://www.brucelawson.co.uk/2011/longdesc-in-html5/comment-page-1/#comment-749853) >>> I'm wondering if this is an appropriate used of <details> >>> >>> <details> >>> <summary> >>> <img src=chart.png alt="Graph of percentage of total U.S. >>> non-institutionalized population age 16-64 declaring one or more >>> disabilities"> >>> </summary> >>> <p>The bar graph shows the percentage of total U.S. noninsitutionalized >>> population age 16-64 declaring one or more disabilities. The percentage >>> value for each category is as follows:</p> >>> <ul> >>> <li>Total declaring one or more >>> disabilities: 18.6 percent </li> >>> <li>Sensory (visual and hearing): 2.3 >>> percent</li> >>> <li>Physical: 6.2 percent</li> >>> <li>Mental: 3.8 percent</li> >>> <li>Self-care: 1.8 percent</li> >>> <li>Diffuculty going outside the home: >>> 6.4 percent</li> >>> <li>Employment disability: 11.9 >>> percent</li> >>> </ul> >>> <p>data retrieved from <a >>> href="http://www.census.gov/prod/2003pubs/c2kbr-17.pdf" title="Link to >>> External Site" class="external">2000 U.S. Census<span> - >>> external link</span></a></p> >>> </details> >>> >>> .. thereby acting as a discoverable-by-anyone longdesc. (The example is >>> adapted from the longdesc example at >>> http://webaim.org/techniques/images/longdesc#longdesc) >>> >>> Note to grumpy people: I'm not trying to advocate abolishing longdesc, >>> just seeeing whether details can be used as an alternative. >> >> It's a bit weird, but sure. >> >> (Well, except for your alt="" text, which is a title="", not an alt="".) >> >> >> On Sat, 2 Apr 2011, John Foliot wrote: >>> >>> Interesting question. Referring to the spec, I think that you may have >>> in fact uncovered a bug in the text. The spec states: >>> >>> "The user agent should allow the user to request that the details >>> be shown or hidden." >>> >>> The problem (or potential problem) here is that the behaviour is defined >>> in visual terms - >> >> The spec explicitly says that these terms have non-visual meaning. >> >> >> On Mon, 4 Apr 2011, Bjartur Thorlacius wrote: >>> >>> IMO, the specification of the <details> element is overly focused on >>> expected renderings. Rather than explicitly defining the semantics of >>> <details> with or without an @open attribute, and with or without a >>> <summary> child, sane renderings for medium to large displays whith whom >>> the user can interact are described, and usage is to be inferred >>> therefrom. This is suboptimal, as it allows hiding <details open>s on >>> small output windows but shoulds against it as strongly as ignoring >>> addition of the open attribute. Note that the <details> element >>> represents a disclosure widget, but the contents are nowhere defined >>> (neither as additional information (that a user-agent may or may not >>> render, depending on factors such as scarcity of screen estate), nor as >>> spoiling information that shouldn't be provided to the user without >>> explicit consent). I regard the two different use cases as different, >>> even though vendors might implement both with { binding: details; } on >>> some media. <Details> can't serve both. It's often spoken of as if >>> intended for something else than the YouTube video description use case. >>> <Details> mustn't be used for hiding spoilers, or else browsers won't be >>> able to intelligently choose to render the would-be concealed contents. >> >> I've clarified <details> to be better defined in this respect. I hope it >> addresses your concern. >> >> >> On Fri, 22 Apr 2011, Dimitri Glazkov wrote: >>> >>> I wonder if it makes sense to introduce a set of pseudo-classes on the >>> video/audio elements, each reflecting a state of the media on the >>> controls (playing/paused/error/etc.)? Then, we could use just CSS to >>> style media controls (whether native or custom), and not have to listen >>> to DOM events just to tweak their appearance. >> >> On Sat, 23 Apr 2011, Philip J盲genstedt wrote: >>> >>> With a sufficiently large set of pseudo-classes it might be possible to >>> do *display* most of the interesting state, but how would you *change* >>> the state without using scripts? Play/pause, seek, volume, etc... >> >> On Sat, 23 Apr 2011, Dimitri Glazkov wrote: >>> >>> This is not the goal of using pseudo-classes: they just provide you with >>> a uniform way to react to changes. >> >> On Sat, 23 Apr 2011, Philip J盲genstedt wrote: >>> >>> In other words, one would still have to rely heavily on scripts to >>> actually implement custom controls? >>> >>> Also, how would one style a progress bar using pseudo-classes? How about >>> a displaying elapsed/remaining time on the form MM:SS? >> >> On Sat, 23 Apr 2011, Dimitri Glazkov wrote: >>> >>> I am not in any way trying to invent a magical way to style media >>> controls entirely in CSS. Just trying to make the job of controls >>> developers easier and use CSS where it's well... useful? :) >> >> On Sat, 23 Apr 2011, Philip J盲genstedt wrote: >>> >>> Very well, what specific set pseudo-classes do you think would be >>> useful? >> >> On Sat, 23 Apr 2011, Dimitri Glazkov wrote: >>> >>> I can infer what would be useful from WebKit's media controls as a first >>> stab? >> >> On Mon, 25 Apr 2011, Silvia Pfeiffer wrote: >>> >>> A markup and CSS example would make things clearer. How do you think it >>> would look? >> >> On Sun, 24 Apr 2011, Dimitri Glazkov wrote: >>> >>> Based on WebKit's current media controls, let's start with these pseudo-classes: >>> >>> Play state: >>> - loading >>> - playing >>> - streaming >>> - error >>> >>> Capabilities: >>> - no-audio >>> - no-video >>> - has-closed-captioning >>> >>> So, to show a status message while the control is loading or streaming >>> and hide when it's done: >>> >>> video -webkit-media-controls-status-display { >>> display: none; >>> } >>> >>> >>> video:loading -webkit-media-controls-status-display, video:streaming >>> -webkit-media-controls-status-display { >>> display: initial; >>> ... >>> } >>> >>> Similarly, to hide volume controls when there's no audio: >>> >>> video:no-audio -webkit-media-controls-volume-slider-container { >>> display: none; >>> } >>> >>> Once I put these pseudo-classes in place for WebKit, a lot of the code in >>> http://codesearch.google.com/codesearch/p#OAMlx_jo-ck/src/third_party/WebKit/Source/WebCore/html/shadow/MediaControlRootElement.cpp&exact_package=chromium >>> will go away, being replaced with straight CSS. >> >> Sounds to me like a poor man's XBL. I'd much rather see this addressed >> using a full-on binding solution, since it seems like it would be only a >> little more complex yet orders of magnitude more powerful. >> >> >> On Fri, 13 May 2011, Narendra Sisodiya wrote: >>> >>> What i want is a general purpose synchronize mechanism when resource >>> like (text, video, graphics, etc) will be played over a general purpose >>> timer (timeline) with interaction.. >>> >>> Ex - >>> >>> <resource type="html" src="asd.html" x="50%" y="50%" width="10%" >>> height="10%" z="6" xpath="page1" tIn="5000ms" tOut="9400ms" >>> inEffect="fadein" outEffect="fadeout" inEffectDur="1000ms" >>> outEffectDur="3000ms"/> >>> >>> <resource type="html" src="Indian.ogv" x="50%" y="50%" width="10%" >>> height="10%" z="6" xpath="page2" tIn="5000ms" tOut="9400ms" >>> inEffect="fadein" outEffect="fadeout" inEffectDur="1000ms" >>> outEffectDur="3000ms"/> >> >> Sounds like SMIL. I recommend looking into SMIL and SVG (which includes >> parts of SMIL). >> >> >> On Fri, 13 May 2011, Philip J盲genstedt wrote: >>> >>> Problem: >>> >>> <video src="video.webm"></video> >>> ... >>> <script> >>> document.querySelector('video').oncanplay = function() { >>> /* will it run? */ >>> }; >>> </script> >>> >>> In the above the canplay event can be replaced with many others, like >>> loadedmetadata and loadeddata. Whether or not the event handler has been >>> registered by the time the event is fired depends on how fast decoding >>> is, how fast the network is and how much "..." there is. >> >> Yes, if you add an event listener in a task that runs after the task that >> fires the event could have run, you won't always catch the event. >> >> That's just a bug in the JS. >> >> >> On Fri, 13 May 2011, Henri Sivonen wrote: >>> >>> <iframe src=foo.html></iframe> >>> <script> >>> document.querySelector('iframe').onload = function() { >>> /* will it run? */ >>> }; >>> </script> >>> has the same problem. The solution is using the onload markup attribute >>> that calls a function declared in an earlier <script>: >>> >>> <script> >>> function iframeLoaded() { >>> /* It will run! */ >>> } >>> </script> >>> <iframe src=foo.html onload=iframeLoaded()></iframe> >> >> Exactly. >> >> >> On Sat, 14 May 2011, Ojan Vafai wrote: >>> >>> If someone proposed a workable solution, browser would likely implement >>> it. I can't think of a backwards-compatible solution to this, so I agree >>> that developers just need to learn the that this is a bad pattern. I >>> could imagine browsers logging a warning to the console in these cases, >>> but I worry that it would fire too much in today's web. >> >> Indeed. >> >> >>> It's unfortunate that you need to use an inline event handler instead of >>> one registered via addEventListener to avoid the race condition. >>> Exposing something to the platform like jquery's live event handlers ( >>> http://api.jquery.com/live/) could mitigate this problem in practice, >>> e.g. it would be just as easy or easier to register the event handler >>> before the element is created. >> >> You can also work around it by setting src="" from script after you've >> used addEventListener, or by checking the state manually after you've >> added the handler and calling the handler if it is too late (though you >> have to be aware of the situation where the event is actually already >> scheduled and you added the listener between the time it was scheduled and >> the time it fired, so your function really has to be idempotent). >> >> >> On Sun, 15 May 2011, Olli Pettay wrote: >>> >>> There is no need to use inline event handler. >>> One can always add capturing listener to window for example. >>> window.addEventListener("canplay", >>> function(e) { >>> if (e.target == document.querySelector('video') { >>> // Do something. >>> } >>> } >>> , true); >>> And just do that before the <video> element occurs in the page. >>> That is simple, IMHO. >> >> Indeed, that is another option. >> >> >>> (I wonder why the "Firing a simple event named e" defaults to >>> non-bubbling. It makes many things harder than they should be.) >> >> The default is arbitrary and doesn't affect the platform (since I have >> to decide with each event whether to use the default or not). Changing the >> default would make no difference (I'd just have to go to every site that >> calls the algorithm and switch it from "bubbles" to nothing and nothing to >> "does not bubble"). >> >> >> On Sun, 15 May 2011, Glenn Maynard wrote: >>> >>> If a MediaController is being used it's more complicated; there seems to >>> be no way to query the readyState of a MediaController (almost, but not >>> quite, the "most recently reported readiness state"), or to get a list >>> of slaved media elements from a MediaController without searching for >>> them by hand. >> >> If you're scripting the MediaController, the assumption is that you >> created it so there's no problem. The impled MediaControllers are for the >> declarative case where you don't need scripting at all. >> >> >> On Mon, 16 May 2011, Simon Pieters wrote: >>> >>> The state can have changed before the event has actually fired, since >>> state changes are sync but the events are queued. So if the script >>> happens to run in between then func is run twice. >> >> That's true. >> >> >> On Mon, 16 May 2011, Remy Sharp wrote: >>> >>> Now you're right, whoever pointed out the 7am alarm example, if you >>> attach the event too late, then you'll miss the boat. However, it's a >>> chicken an egg situation. You don't have the DOM so you can't attach >>> the event handler, and if you do have the DOM, the damn event has fired >>> already. >>> >>> What's the fix? Well, the work arounds are certainly viable, again from >>> an everyman developer point of view: >>> >>> 1) Attach higher up, on the window object and listen for the >>> canplay/loadedmetadata/etc and check the event.target >>> >>> 2) Attach an inline event handler (not nice, but will do) >>> >>> The fix? Since ultimately we have exactly the same potential "bug" with >>> image load events >> >> Not just those, also iframes, own document navigation, sockets, XHR, >> anything that does asynchronous work, in fact. >> >> >>> is to update the specification and make it clear: that depending on the >>> speed of the connection and decoding, the following "xyz" events can >>> fire **before** your script runs. Therefore, here's a couple of work >>> arounds - or just be aware. >> >> I don't really know where to put this that would actually help. >> >> >> On Tue, 17 May 2011, Philip J盲genstedt wrote: >>> >>> Still, I don't think just advocacy is any kind of solution. Given that >>> you (the co-author of an HTML5 book) make certain assumptions about the >>> outcome of this race condition, it's safe to assume that hoards of web >>> developers will do the same. >>> >>> To target this specific pattern, one hypothetical solution would be to >>> special-case the first script that attaches event handlers to a <video> >>> element. After it has run, all events that were already fired before the >>> script are fired again. However, this seems awfully messy if the script >>> also observes readyState or networkState. It might also interfere with >>> browsers that use scripts behind the scenes to implement the native >>> controls. >>> >>> Although a kludge, another solution might be to block events from being fired >>> until x more bytes of the document have been parsed or it has finished >>> loading. >> >> On Wed, 18 May 2011, Robert O'Callahan wrote: >>> >>> For certain kinds of events ("load", the video events, maybe more), >>> delay the firing of such events until, say, after DOMContentLoaded has >>> fired. If you're careful you might be able to make this a strict subset >>> of the behaviors currently allowed by the spec ... i.e. you're >>> pretending that your frame, image and video loads simply didn't complete >>> until after DOMContentLoaded fired in the outer page. That would mean >>> it's compatible with properly-written legacy content ... if there is >>> any. >>> >>> Of course I have no idea whether that approach is actually feasible :-). >>> It obviously isn't compatible with what browsers currently do, so >>> authors wouldn't want to rely on it for a long time if ever. >> >> These don't seem like workable solutions. We can't delay load events for >> every image on the Web, surely. Remembering every event that's ever fired >> for any <img> or <video> just in case a handler is later attached seems a >> bit intractable, too. >> >> This has been a problem since JavaScript was added in the 90s. I find it >> hard to believe that we have to suddenly fix it now. >> >> >> On Tue, 24 May 2011, Silvia Pfeiffer wrote: >>> >>> Ian and I had a brief conversation recently where I mentioned a problem >>> with extended text descriptions with screen readers (and worse still >>> with braille devices) and the suggestion was that the "paused for user >>> interaction" state of a media element may be the solution. I would like >>> to pick this up and discuss in detail how that would work to confirm my >>> sketchy understanding. >>> >>> *The use case:* >>> >>> In the specification for media elements we have a <track> kind of >>> "descriptions", which are: >>> "Textual descriptions of the video component of the media resource, >>> intended for audio synthesis when the visual component is unavailable >>> (e.g. because the user is interacting with the application without a >>> screen while driving, or because the user is blind). Synthesized as a >>> separate audio track." >>> >>> I'm for now assuming that the synthesis will be done through a screen >>> reader and not through the browser itself, thus making the >>> descriptions available to users as synthesized audio or as braille if >>> the screen reader is set up for a braille device. >>> >>> The textual descriptions are provided as chunks of text with a start >>> and a end time (so-called "cues"). The cues are processed during video >>> playback as the video's playback time starts to fall within the time >>> frame of the cue. Thus, it is expected the that cues are consumed >>> during the cue's time frame and are not present any more when the end >>> time of the cue is reached, so they don't conflict with the video's >>> normal audio. >>> >>> However, on many occasions, it is not possible to consume the cue text >>> in the given time frame. In particular not in the following >>> situations: >>> >>> 1. The screen reader takes longer to read out the cue text than the >>> cue's time frame provides for. This is particularly the case with long >>> cue text, but also when the screen reader's reading rate is slower >>> than what the author of the cue text expected. >>> >>> 2. The braille device is used for reading. Since reading braille is >>> much slower than listening to read-out text, the cue time frame will >>> invariably be too short. >>> >>> 3. The user seeked right into the middle of a cue and thus the time >>> frame that is available for reading out the cue text is shorter than >>> the cue author calculated with. >>> >>> Correct me if I'm wrong, but it seems that what we need is a way for >>> the screen reader to pause the video element from continuing to play >>> while the screen reader is still busy delivering the cue text. (In >>> a11y talk: what is required is a means to deal with "extended >>> descriptions", which extend the timeline of the video.) Once it's >>> finished presenting, it can resume the video element's playback. >> >> Is it a requirement that the user be able to use the regular video pause, >> play, rewind, etc, controls to seek inside the extended descriptions, or >> should they literally pause the video while playing, with the audio >> descriptions being controlled by the same UI as the screen reader? >> >> >>> IIUC, a video is "paused for user interaction" basically when the UA has >>> decided to pause the video without the user asking to pause it (i.e. the >>> paused attribute is false) and the pausing happened not for network >>> buffering reasons, but for other reasons. IIUC one concrete situation >>> where this state is used is when the UA has reached the end of the >>> resource and is waiting for more data to come (e.g. on a live stream). >> >> That latter state is not "paused for user interaction", it's just stalled >> due to lack of data. The rest is accurate though. >> >> >>> To use "paused for user interaction" for extending descriptions, we need >>> to introduce a means for the screen reader to tell the UA to pause the >>> video when it reaches the end of the cue and it's still busy delivering >>> a cue's text. Then, as it finishes, it will un-pause the video to let it >>> continue playing. >>> >>> To me it sounds like a feasible solution. >>> >>> The screen reader could even provide a user setting and a short-cut so a >>> user can decide that they don't want this pausing to happen or that they >>> want to move on from the current cue. >>> >>> Another advantage of this approach is that e.g. a deaf-blind user could >>> hook up their braille device such that it will deliver the extended >>> descriptions and also deliver captions through braille with such >>> extension pausing happening. (Not sure that such a user would even want >>> to play the video, but it would be possible.) >>> >>> Now, I think there is one problem though (at least as far as I can >>> tell). Right now, IIUC, screen readers are only passive listeners on the >>> UA. They don't influence the behaviour of the UA. The accessibility API >>> is basically only a one-way street from the UA to the AT. I wonder if >>> that is a major inhibitor of using this approach or whether it's easy >>> for UAs to overcome this limitation? (Or if such a limitation even >>> exists - I don't know enough about how AT work...). >>> >>> Is that an issue? Are there other issues that I have overlooked? >> >> That seems to be entirely an implementation issue. >> >> -- >> Ian Hickson U+1047E )\._.,--....,'``. fL >> http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. >> Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.' >
Received on Saturday, 4 June 2011 01:41:33 UTC