- From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Date: Tue, 15 Feb 2011 22:21:46 +1100
Philip, As promised here is the summarized list of things that after this discussion I still think we should add/change: * the file magic string should not be ?WEBVTT FILE?, but ?WEBVTT? only (or alternatively "WebVTT", but typically magic identifiers are all caps * allow for name-value pairs as file-wide metadata underneath the file magic string and specify the format for providing name-value pairs - only an empty line determines the end of the header section * allow the use of shorter time specifiers, in particular: - "[[h*:]mm:]ss[.[d[c[m]]] | s*[.d[c[m]]]" as the start and end time - "-" as the separator between start and end time instead of ?-->? - "+s*[.d[c[m]]]" as a possible end time specifier, or a relative mid-cue timestamp; the relative mid-cure timestamp works in aggregation * allow commenting out whole lines after a ?//? (or a "#") at the line start * use more verbose cue settings: direction (instead of D), linePosition (instead of L), textPosition (instead of T), size (instead of S), align (instead of A) * introduce default cue settings in the header part of the file, possibly as a name-value pair, or some alternative dedicated form * allow the use of the <u> element for underlined sections (assuming I can find some examples for this) All the rest of the issues seems to be covered through the <c> element and classes. Cheers, Silvia. On Tue, Feb 15, 2011 at 10:20 PM, Silvia Pfeiffer <silviapfeiffer1 at gmail.com> wrote: > On Tue, Feb 15, 2011 at 9:09 PM, Philip J?genstedt <philipj at opera.com> wrote: >> On Tue, 15 Feb 2011 04:28:36 +0100, Silvia Pfeiffer >> <silviapfeiffer1 at gmail.com> wrote: >> >>> Hi Philip, >>> >>> On Tue, Feb 15, 2011 at 3:27 AM, Philip J?genstedt <philipj at opera.com> >>> wrote: >>>> >>>> On Wed, 09 Feb 2011 03:57:37 +0100, Silvia Pfeiffer >>>> <silviapfeiffer1 at gmail.com> wrote: >>>> >>>>>>> A. Feedback on the WebVTT format >>>>>> >>>>>>> 1. Introduce file-wide metadata >>>>>>> >>>>>>> WebVTT requires a structure to add header-style metadata. We are here >>>>>>> talking about lists of name-value pairs as typically in use for header >>>>>>> information. The metadata can be optional, but we need a defined means >>>>>>> of adding them. >>>>>>> >>>>>>> Required attributes in WebVTT files should be the main language in use >>>>>>> and the kind of data found in the WebVTT file - information that is >>>>>>> currently provided in the <track> element by the @srclang and @kind >>>>>>> attributes. These are necessary to allow the files to be interpreted >>>>>>> correctly by non-browser applications, for transcoding or to determine >>>>>>> if a file was created as a caption file or something else, in >>>>>>> particular the @kind=metadata. @srclang also sets the base >>>>>>> directionality for BiDi calculations. >>>>>> >>>>>> Are there non-browsers that use the language for font-selection or >>>>>> bidi? >>>>>> Is >>>>>> auto-detection not likely to give a better user experience? Are there >>>>>> any >>>>>> other use cases for knowing the language of the captions *after* >>>>>> they've >>>>>> been opened? >>>>> >>>>> >>>>> I can't see a different way to let non-browser applications know what >>>>> font to choose, even how to provide the user with a menu of available >>>>> caption tracks for a video, or to set the base directionality for >>>>> BiDi. Also, language auto-detection is a huge burden to put onto >>>>> non-browser applications. Having a readable language tag at the >>>>> beginning of the file is useful to quickly figure it all out. >>>>> >>>>> The language set in <track> would certainly overrule what is in the >>>>> file. Also, the last language attribute in the header would probably >>>>> win. >>>>> >>>>> I guess it would also be ok to have language and kind optional - >>>>> different applications may then default to interpreting WebVTT files >>>>> differently, such as by default English and Captions - or English and >>>>> Descriptions, but that's probably acceptable from context. >>>> >>>> Given that most existing subtitle formats don't have any language >>>> metadata, >>>> I'm a bit skeptical. However, if implementors of non-browser players want >>>> to >>>> implement WebVTT and ask for this I won't stand in the way (not that I >>>> could >>>> if I wanted to). For simplicity, I'd prefer the language metadata from >>>> the >>>> file to not have any effect on browsers though, even if no language is >>>> given >>>> on <track>. >>> >>> There is also the Content-Language response header of HTTP, which >>> could have an influence on the browser, too. I'm not sure about the >>> best way to deal with all this overlapping information, but I'm sure >>> it can be sorted out. >> >> My preference is ignoring everything except what is given in <track>. In >> particular language can't be given in the resource or its headers, because >> then one has to fetch all the tracks in order to provide a track selection >> menu with language information or to automatically activate the suitable >> tracks. > > Ah yes, that makes sense. I'd have to agree. > > > >>>>>> Why do non-browser players need to know the kind? All kinds are >>>>>> processed >>>>>> in >>>>>> the same way except metadata, and there's no reason to use metadata >>>>>> tracks >>>>>> with external players. >>>>> >>>>> Maybe I have a different view of what applications will make use of >>>>> WebVTT files than most. My thinking is that there will also be uses >>>>> for metadata tracks in external applications. Aside from this, there >>>>> will be authoring applications and players, yes, but there will also >>>>> be automated processing tools. So, to know what type of content is >>>>> inside a file without having to look at more than the file's headers >>>>> is really important. >>>> >>>> For both of these cases, putting some magic strings inside comments that >>>> are >>>> ignored by browsers sounds like it would be sufficient. Name-value >>>> metadata >>>> that is ignored by browsers would be fine as well. >>> >>> I'm for the second option: name-value metadata that is ignored by the >>> browser. I think in fact the browser should in general ignore all >>> name-value metadata with the exception of file-wide cue settings. >> >> I agree, browsers should ignore in-file metadata. (That's one reason I think >> using comments for it is quite fine most of the time.) > > Maybe then we should find a different way to set the default settings > for the cues and not in a CueSettings=... metadata field. It seemed > elegant, but I am not so sure any more. > > > >>>>>>> Further metadata fields that are typically used by authors to keep >>>>>>> specific authoring information or usage hints are necessary, too. As >>>>>>> examples of current use see the format of MPlayer mpsub?s header >>>>>>> metadata [2], EBU STL?s General Subtitle Information block [3], and >>>>>>> even CEA-608?s Extended Data Service with its StartDate, Station, >>>>>>> Program, Category and TVRating information [4]. Rather than specifying >>>>>>> a specific subset of potential fields we recommend to just have the >>>>>>> means to provide name-value pairs and leave it to the negotiation >>>>>>> between the author and the publisher which fields they expect of each >>>>>>> other. >>>>>> >>>>>> This approach has worked very well with Vorbis Comments, probably >>>>>> mostly >>>>>> because all interesting fields have been pre-defined in >>>>>> http://www.xiph.org/vorbis/doc/v-comment.html >>>>>> >>>>>> For a web format though, wouldn't some kind of wiki registry be good to >>>>>> avoid total mayhem, especially if there are some predefined fields? >>>>>> (Not >>>>>> having file-wide metadata would also avoid such mayhem.) >>>>> >>>>> It might be good to define a base set - the Vorbis Comments one or the >>>>> ID3 ones could be appropriate. Even the old Dublin Core set (the first >>>>> ones, not the current chaos) could be good. I could also analyse the >>>>> sets used in current typical caption formats and propose a superset of >>>>> those. >>>>> >>>>> While I think you're right with suggesting a predefined set of fields, >>>>> I am mostly keen right now to agree on the general format of the >>>>> fields and how we need to parse them rather than what they actually >>>>> are. >>>>> >>>>> So, I would suggest we allow lines of "name=value" under the WEBVTT >>>>> magic string. A blank line defines the end of the header section and >>>>> the beginning of the cues. Would be simple enough to parse, right? >>>> >>>> Sure, it's already handled by the current parsing spec, since it ignores >>>> everything up to the first blank line. >>> >>> That's not quite how I'm reading the spec. >>> >>> >>> http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#webvtt-0 >>> allows >>> "Optionally, either a U+0020 SPACE character or a U+0009 CHARACTER >>> TABULATION (tab) character followed by any number of characters that >>> are not U+000A LINE FEED (LF) or U+000D CARRIAGE RETURN (CR) >>> characters." >>> after the "WEBVTT FILE" magic. >>> To me that reads like all of the extra stuff has to be on the same line. >>> I'd prefer if this read "any character except for two WebVTT line >>> terminators", then it would all be ready for such header-style >>> metadata. >> >> See steps 12-17 of >> <http://www.whatwg.org/specs/web-apps/current-work/multipage/video.html#parsing-0>, >> it just skips all lines up to the first blank line. Syntax and parsing are >> different :) > > So it's not in the syntax spec, but acceptable input, hmmm. I think we > should add it explicitly to the spec and define the general way in > which metadata is supposed to be given, such as in the form > <name>=<value>. We don't have to parse it, but it should be in the > syntax specification. > > > >>>>>>> 4. Cue formatting requirements >>>>>>> >>>>>>> In analysing the available cue formatting functionality, we have found >>>>>>> that some features are missing. Most of these features can be added >>>>>>> through using CSS on cues that have received a <b>, <i>, <c> or <v> >>>>>>> marker. The following features are core to traditional TV and exist in >>>>>>> EBU STL and CEA-608/708 captions. Support of these will be a core >>>>>>> requirement for browsers as well as non-browser applications and it >>>>>>> makes sense to add these to WebVTT rather than relying on external CSS >>>>>>> which cannot be used for non-browser captions: >>>>>> >>>>>> The unstated requirement here seems to be that WebVTT needs to work as >>>>>> an >>>>>> interchange format for various TV captioning formats even in user >>>>>> agents >>>>>> without any support for CSS (or JavaScript). I'm trying to not make a >>>>>> straw >>>>>> man argument, but if want an interchange format, we should pick TTML, >>>>>> which >>>>>> is explicitly designed to be just that and doesn't depend on CSS. >>>>>> >>>>>> Is it not enough that a lossy conversion can be made from various >>>>>> formats >>>>>> into WebVTT+CSS(+JavaScript)? If not, the "Web" in "WebVTT" is highly >>>>>> misleading... >>>>> >>>>> >>>>> We're trying to avoid the need for multiple transcodings and are >>>>> trying to achieve something like the following pipeline: >>>>> broadcast captions -> transcode to WebVTT -> show in browser -> >>>>> transcode to broadcast devices -> show >>>>> >>>>> If we have to plug TTML into this pipeline, too, it will be much >>>>> slower and we would need to additionally define a mapping from TTML to >>>>> WebVTT and back. >>>>> >>>>> I'm sure with SMPTE-TT around we will end up seeing things like >>>>> broadcast->TTML->WebVTT->browser, but even then we don't want WebVTT >>>>> to be a lossy format. >>>> >>>> I can only disagree. Trying to make WebVTT into an interchange format >>>> will >>>> inevitably turn it into a highly presentational format with lots of >>>> legacy >>>> baggage. I can certainly see the use cases for an interchange format, but >>>> I >>>> don't think it's worth the added complexity. I'd prefer an approach where >>>> any format quirks that can't be mapped to WebVTT are expressed using >>>> <c.foo> >>>> and if it turns out lots of people want the feature, we can add it to a >>>> future revision. >>> >>> I wouldn't go as far as to say it needs to become an interchange >>> format. But I can see us specifying what the browser parses, while >>> given options such as the header-metadata and span classes that allow >>> with some extra information to fully recover the broadcast >>> functionality. I actually think that is almost possible already. >> >> After this thread has run for a while, it'd be nice to hear where you think >> <c.foo> isn't enough and new markup is needed, if anything. > > I'll give a summary in a separate email so it's easier to see. > > > >>>>>>> * underline: EBU STL, CEA-608 and CEA-708 support underlining of >>>>>>> characters. The underline character is also particularly important for >>>>>>> some Asian languages. Please make it possible to provide text >>>>>>> underlines without the use of CSS in WebVTT. >>>>>> >>>>>> Which Asian languages? If it's just the Chinese >>>>>> <http://en.wikipedia.org/wiki/Proper_name_mark>, then I don't think >>>>>> that >>>>>> needs <u> or similar. In my experience, use of the Chinese proper name >>>>>> mark >>>>>> is in fact extremely rare in Chinese captions, at least in movies and >>>>>> TV >>>>>> series from the mainland and Taiwan. It would be best to use e.g. >>>>>> ???<c.pnm>??</c> to make it easy to change the style between >>>>>> single/double/wavy/no underline. >>>>> >>>>> OK. So if we need underlined text, it will need to be >>>>> <c.underline>..</c> and CSS underline? I guess in a Web context >>>>> underline text is usually a hyperlink so it makes sense to discourage >>>>> <u> for the Web. But is that also an argument for >>>>> captions/subtitles/descriptions? What is the argument against using >>>>> <u> in captions? >>>> >>>> I don't really have an argument against it, I just questioned that it is >>>> important for Asian languages in particular. Adding <u> would be really >>>> simple, it's just a question of why. I've seldom seen underlining in >>>> captions, so it's not clear to me how it's usually used. >>> >>> I'm told <u> is fairly common in traditional captions. We don't do >>> <c.italics> either for such common stuff. >>> But if we really don't want this, I guess <c.u> would work, too and is >>> not that much longer. >> >> I can't see any underlining when scanning through the samples at >> <http://wiki.whatwg.org/wiki/Use_cases_for_timed_tracks_rendered_over_video_by_the_UA>. >> If it is in fact common in some contexts, it'd be great to have samples >> added to the wiki, I'm sure we could learn something from it. If <u> is >> actually useful for something, then we should just add it. > > I've asked for examples - I personally don't have any either, unfortunately. > > >>>>> With "-" you are referring to replacing "-->" with "-" to arrive at >>>>> things >>>>> like: >>>>> 15.000-17.950 >>>>> At the left we can see... >>>>> >>>>> as compared to: >>>>> 15.000+2.950 >>>>> At the left we can see... >>>> >>>> Yes, that's what I meant. >>>> >>>>> I actually think they read fairly given that people are used to the >>>>> double meaning of "-": to mean both "from ... to" and "minus". >>>>> But we could use a different character for "absolute time" if you >>>>> prefer, e.g. "/". >>>>> 15.000/17.950 >>>>> At the left we can see... >>>>> >>>>> I find this fairly readable, too. >>>> >>>> Either would work for me. As I mentioned, the room for improvement here >>>> isn't only the syntax of the timing line, but also to make it obvious >>>> that >>>> cue timestamps like <00:01.000> are relative. Using + for relative >>>> timestamps is potentially confusing too, as one might think that many >>>> consecutive <+00:01.000> are cumulative, rather than all being 1 second >>>> from >>>> the start time of the cue. >>> >>> That's true and in fact the way in which I have authored my examples, >>> now that I look back at them. It makes the timings smaller and I think >>> it's a bit more logical. But really we just have to decide on one >>> meaning: >>> >>> 5-10 >>> This <+1>is <+1>a <+1>simple <+1>example. >>> >>> I find I actually prefer this over >>> >>> 5-10 >>> This <+1>is <+2>a <+3>simple <+4>example. >> >> Right, we just have to pick something. I'd like to get the basic structure >> down soon, though, as changing the timestamp parsing will be very difficult >> once there are implementations. > > > Agreed. Which one would you prefer? > > > >>>>>>> 7. Comments >>>>>> >>>>>>> we recommend the introduction of comments. >>>>>> >>>>>> I agree and think it needs to happen before WebVTT starts to get >>>>>> implemented >>>>>> and used on the web. In other words: now. >>>>> >>>>> Agreed. I'm happy for the previously suggested "//" at the line start >>>>> to be comments, or, for that matter, "#" or ";" or any other special >>>>> character. I would prefer not to use "/*" since it implies a "*/" is >>>>> required to end the comment. Similarly we should avoid "<!--" and >>>>> "-->" or anything else that requires a special comment end mark and >>>>> more than one or two characters. >>>> >>>> I'd quite like to have block comments, so I think the best options are: >>>> >>>> 1. // and /* */ like JavaScript >>>> 2. <!-- --> like HTML/XML >>> >>> If the main use case for the comments is to comment out a line, >>> something at the line start alone would be sufficient. If we have to >>> have both, I would prefer the shorter first option. >>> >>>> I think that the main difficulty is actually not picking a syntax, but >>>> deciding how it works in the parser. Unlike HTML, I don't think we want >>>> the >>>> comments to show up in the "DOM", since that would only work for >>>> intra-cue >>>> comments. Ideally it would be preprocessor-ish, but yet the magic bytes >>>> ("WEBVTT FILE") should be checked first as otherwise identifying WebVTT >>>> would require implementing its preprocessor steps :/ >>> >>> As I would not want the comments not to be handed into the DOM or to >>> JavaScript, it doesn't matter if they are not like HTML. I would >>> regard them more as pre-processor style comments. > > > Ups, there was a surplus second "not". :-) I also don't want them > handed into the DOM. > > >> For simplicity, perhaps it would be better to have line-comments only. On my >> wishlist I have a less convoluted parser definition which operates on lines >> instead of sprinkling CR/LF all over, and it'd be easy to add line-comments >> to such a parser. Wish-list item requested at >> <http://www.w3.org/Bugs/Public/show_bug.cgi?id=12076>. > > I agree. It was with a line-based parsing in mind that I preferred the > start-of-line comments. I don't really want to make them more > complicated than that. > > > >>>>>>> 8. Line wrapping >>>>>>> >>>>>>> CEA-708 captions support automatic line wrapping in a more >>>>>>> sophisticated way than WebVTT -- see >>>>>>> http://en.wikipedia.org/wiki/CEA-708#Word_wrap. >>>>>>> >>>>>>> In our experience with YouTube we have found that in certain >>>>>>> situations this type of automatic line wrapping is very useful. >>>>>>> Captions that were authored for display in a full-screen video may >>>>>>> contain too many words to be displayed fully within the actual video >>>>>>> presentation (note that mobile / desktop / internet TV devices may >>>>>>> each have a different amount of space available, and embedded videos >>>>>>> may be of arbitrary sizes). Furthermore, user-selected fonts or font >>>>>>> sizes may be larger than expected, especially for viewers who need >>>>>>> larger print. >>>>>>> >>>>>>> WebVTT as currently specified wraps text at the edge of their >>>>>>> containing blocks, regardless of the value of the 'white-space' >>>>>>> property, even if doing so requires splitting a word where there is no >>>>>>> line breaking opportunity. This will tend to create poor quality >>>>>>> captions. ?For languages where it makes sense, line wrapping should >>>>>>> only be possible at carriage return, space, or hyphen characters, but >>>>>>> not on characters. ?(Note that CEA-708 also contains >>>>>>> non-breaking space and non-breaking transparent space characters to >>>>>>> help control wrapping.)However, this algorithm will not necessarily >>>>>>> work for all languages. >>>>>>> >>>>>>> We therefore suggest that a better solution for line wrapping would be >>>>>>> to use the existing line wrapping algorithms of browsers, which are >>>>>>> presumably already language-sensitive. >>>>>>> >>>>>>> [Note: the YouTube line wrapping algorithm goes even further by >>>>>>> splitting single caption cues into multiple cues if there is too much >>>>>>> text to reasonably fit within the area. YouTube then adjusts the times >>>>>>> of these caption cues so they appear sequentially. ?Perhaps this could >>>>>>> be mentioned as another option for server-side tools.] >>>>>> >>>>>> Yeah, with SRT people are manually line-wrapping when authoring the >>>>>> captions >>>>>> and often enough the end result is that you get something rendered: >>>>>> >>>>>> - Who could have guessed that not all fonts are the same >>>>>> size? >>>>>> - That's news to me, so I get four lines of text where I >>>>>> wanted two! >>>>>> >>>>>> I'm inclined to say that we should normalize all whitespace during >>>>>> parsing >>>>>> and not have explicit line breaks at all. If people really want two >>>>>> lines, >>>>>> they should use two cues. In practice, I don't know how well that would >>>>>> fare, though. What other solutions are there? >>>>> >>>>> I don't think I would go that far. The concern has mostly been with >>>>> the line wrapping of lines that are too long and the possibility of >>>>> splitting words that way. The particular concern was with this >>>>> paragraph: >>>>> >>>>> "Text runs must be wrapped at the edge of their containing blocks, >>>>> regardless of the value of the 'white-space' property, even if doing >>>>> so requires splitting a word where there is no line breaking >>>>> opportunity." >>>>> see >>>>> >>>>> http://www.whatwg.org/specs/web-apps/current-work/multipage/rendering.html#timed-text-tracks-0 >>>>> >>>>> So we want to avoid splitting mid-word and we suggest introducing the >>>>> ability to have non-breaking spaces. >>>> >>>> I think splitting in the middle of words would only happen for words that >>>> are longer than the whole line. >>> >>> Ah ok - I guess you can interpret the sentence above in this way as >>> in"splitting a word ONLY where there is no line breaking opportunity". >>> Then it's probably ok. It would still make sense to accept >>> non-breaking spaces. >> >> Perhaps Hixie would like to clarify in the spec precisely what is meant? >> >> There's already a non-breaking space in Unicode: NO-BREAK SPACE (U+00A0) > > Ah, ok, that's covered then, too. Good to know, thanks. I was thinking > about of course, but we don't need that in a UTF-8 document > then. > > >>>> There's still plenty of room for improvements in line wrapping, though. >>>> It >>>> seems to me that the main reason that people line wrap captions manually >>>> is >>>> to avoid getting two lines of very different length, as that looks quite >>>> unbalanced. There's no way to make that happen with CSS, and AFAIK it's >>>> not >>>> done by the WebVTT rendering spec either. >>> >>> People split manually when they want quality captions and can visually >>> test what it will look like. >>> >>> This endeavor has one big problem: when you change the video size, >>> e.g. go to full screen, your optimisation for the previous size is >>> likely to not be optimal for the new size any more. There, an >>> automatic line balancing that makes use of commas and "and"s for >>> choosing likely good line break positions would be nice. >>> >>> A completely different situation appears when the captions are not >>> manually created, as is the case in YouTube. Even when you submit a >>> perfect transcript and time-align it through speech recognition, you >>> will only do the line breaks as you have to render cues. To achieve a >>> better quality there, a better line-break algorithm would help >>> massively. >>> >>> So, I agree with you about improving the line wrapping. I also think >>> it is likely something that we have to leave to the browsers - at >>> least for now. >> >> Right, some experimentation here would be great, as I haven't seen any >> feature like this in any media players. In the hope of inspiring someone, >> perhaps myself, here's how I tentatively would like things to work: >> >> 1. Authors are encouraged to not manually line-break >> 2. UAs render the text at whatever with the <video> container allows, with >> margins and all >> 3. The text will have been rendered on n lines. >> 4. Decrease the width on the container as much as possible while having n >> lines. >> 5. Use that line-breaking and then do whatever left/center/right-alignment >> relative to the original width. >> >> I really should get around to reading the rendering section for WebVTT to >> see what it actually does, perhaps it's already clever... > > > It is quite clever indeed. And now that we have cleared up the line > breaking issue and the non-breaking space, I think it's as good as > needs be right now. > > > >>>>>>> 4. Addressing individual cues through CSS >>>>>>> >>>>>>> As far as we understand, you can currently address all cues through >>>>>>> ::cue and you can address a cue part through ::cue-part(<voice> || >>>>>>> <part> || <position> || <future-compatibility>). However, if we >>>>>>> understand correctly, it doesn?t seem to be possible to address an >>>>>>> individual cue through CSS, even though cues have individual >>>>>>> identifiers. This is either an oversight or a misunderstanding on our >>>>>>> parts. Can you please clarify how it is possible to address an >>>>>>> individual cue through CSS? >>>>>> >>>>>> Since I've been arguing against the id's in WebVTT, I'm curious about >>>>>> the >>>>>> use case here. Isn't using a unique class good enough? >>>>> >>>>> This links in with the discussion above on CSS styling and classes. >>>>> Rather than define classes of cue settings and reference them from the >>>>> cues, this allows them to be applied to individual cues in style >>>>> sheets. I thought the whole reason of cue identifiers was to have this >>>>> addressing functionality, so this would just close the loop. >>>>> >>>>> For example: >>>>> >>>>> Style sheet of the Web page: >>>>> <style> >>>>> video track#t1 ::cue(cue10) { >>>>> ?text-decoration: blink; >>>>> } >>>>> </style> >>>>> >>>>> The Web page (extract): >>>>> <video src="video.webm" controls> >>>>> ?<track id="t1" label="captions" kind="captions" srclang="en-US" >>>>> src="cap1.vtt"/> >>>>> </video> >>>>> >>>>> The caption file cap1.vtt: >>>>> WEBVTT >>>>> Language=en-US >>>>> Kind=Captions >>>>> >>>>> cue1 >>>>> 0.000-5.000 >>>>> blab blah >>>>> >>>>> cue10 >>>>> 40.000-60.000 >>>>> ALERT: Your basement is flooding - evacuate! >>>>> >>>>> >>>>> Cue10 is addressed through CSS and turned into a blinking text without >>>>> a need to change the markup at all. >>>> >>>> My point was that you could just as well do this: >>>> >>>> 0.000-5.000 >>>> <c.cue1>blab blah</c> >>>> >>>> In my view of things, id's in HTML are primarily for addressing via >>>> #fragments and as hooks for scripts, for styling class is quite >>>> sufficient, >>>> so I'm thinking it would be for WebVTT as well. >>> >>> I quite like the idea of using the identifiers for named media >>> fragment URIs: e.g. http://example.org/video.webm#cue10 . We need >>> identifiers for this. Also, I find them less intrusive in the text >>> than <c.cue1> which defines a class that is only every used on this >>> single cue. >> >> Hmm, isn't that what we have chapters for? Or do you want to use id's for >> some kind of inline chapters? > > > FAICT we would address chapters also by their identifier, so there is > no difference between the kinds of tracks that we have and the way in > which we would address into them. > > >>>>>>> 5. Ability to move captions out of the way >>>>>>> >>>>>>> Our experience with automated caption creation and positioning on >>>>>>> YouTube indicates that it is almost impossible to always place the >>>>>>> captions out of the way of where a user may be interested to look at. >>>>>>> We therefore allow users to dynamically move the caption rendering >>>>>>> area to a different viewport position to reveal what is underneath. We >>>>>>> recommend such drag-and-drop functionality also be made available for >>>>>>> TimedTrack captions on the Web, especially when no specific >>>>>>> positioning information is provided. >>>>>> >>>>>> This would indeed be rather nice, but wouldn't it interfere with text >>>>>> selection? Detaching the captions into a floating, draggable window via >>>>>> the >>>>>> context menu would be a theoretically possible solution, but that's >>>>>> getting >>>>>> rather far ahead of ourselves before we have basic captioning support. >>>>> >>>>> On YouTube you can only move them within the video viewport. You >>>>> should try it - it's really awesome actually. >>>>> >>>>> When you say "interfere with text selection" are you suggesting that >>>>> the text of captions/subtitles should be able to be cut and pasted? I >>>>> wonder what copyright holders think about that. >>>> >>>> Being able to select the captions just like any other text is a great >>>> thing >>>> that I wouldn't want to disable. It's very useful if you want to pause >>>> and >>>> look up the definition of a word or to report a typo in the captions >>>> without >>>> having to retype the whole text. >>> >>> I guess you can have all of that as you can have it on Web pages, too. >>> If you click and hold, it will be grabbing for moving. If you double >>> click it is text selection for cut and paste. So, I don't think there >>> would be a problem. >> >> That would work, but I have to admit I've never seen a web page/browser >> combination that does what you suggest. Just single clicking and dragging is >> certainly the most discoverable form of text selection. > > I actually meant: you select a piece of text (either with double click > or with click and pull and release) and when you click and hold the > selected text, you can move it. But also, text that is in a block and > clearly discerned as an entity (e.g. with a line around it or such) > can often be moved by just clicking on the box/block (outside the text > itself) and moving the pointer. It's these kind of interactions I had > in mind. > > Cheers, > Silvia. >
Received on Tuesday, 15 February 2011 03:21:46 UTC