- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Mon, 8 Oct 2007 12:22:43 +0300
(Heavy quote snipping. Picking on particular points.) On Oct 8, 2007, at 03:14, Silvia Pfeiffer wrote: > This is both, more generic than captions, and less generic in that > captions have formatting and are displayed in a particular way. I think we should avoid overdoing captioning or subtitling by engineering excessive formatting. If we consider how subtitling works with legacy channels (TV and movie theaters), the text is always in the same sans-serif font with white fill and black outline located at the bottom of the video frame (optionally located at the top when there's relevant native text at the bottom and optionally italicized). To get feature parity with the legacy that is "good enough", the only formatting option you need is putting the text at the top of the video frame as opposed to the bottom and optionally italicizing text runs. (It follows that I think the idea of using SVG for captioning or subtitles is excessive.) I wouldn't mind an upgrade path that allowed CSS font properties for captioning and subtitles, but I think we shouldn't let formatting hold back the first iteration. > (colours, alignment etc. - the things that the EBU > subtitling standard http://www.limeboy.com/support.php?kbID=12 is > providing). The EBU format seems severely legacy from the Unicode point of view. :-( > Another option would be to disregard CMML completely and invent a new > timed text logical bitstream for Ogg which would just have the > subtitles. This could use any existing time text format and would just > require a bitstream mapping for Ogg, which should not be hard to do at > all. Is 3GPP Timed Text aka. MPEG-4 part 17 unencumbered? (IANAL, this isn't an endorsement of the format--just a question.) > an alternate audio track (e.g. speex as suggested by you for > accessibility to blind people), My understanding is that at least conceptually an audio description track is *supplementary* to the normal sound track. Could someone who knows more about the production of audio descriptions, please, comment if audio description can in practice be implemented as a supplementary sound track that plays concurrently with the main sound track (in that case Speex would be appropriate) or whether the main sound must be manually mixed differently when description is present? > and several caption tracks (for different languages), I think it needs emphasizing that captioning (for the deaf) and translation subtitling (for people who can hear but who can't follow the language) are distinctly differently in terms of the metadata flagging needs and the playback defaults. Moreover, although translations for multiple languages are nice to have, they complicate UI and metadata considerably and packaging multiple translations in one file is outside the scope of HTML5 as far as the current Design Principles draft (from the W3C side) goes. I think we should first focus on two kinds on qualitatively different timed text (differing in metadata and playback defaults): 1) Captions for the deaf: * Written in the same language as the speech content of the video is spoken. * May have speaker identification text. * May indicate other relevant sounds textually. * Don't indicate text that can be seen in the video frame. * Not rendered by default. * Enabled by a browser-wide "I am deaf or my device doesn't do sound out" pref. 2) Subtitles for the people who can't follow foreign-language speech: * Written in the language of the site that embeds video when there's speech in another language. * Don't identify the speaker. * Don't identify sounds. * Translate relevant text visible in the video frame. * Rendered by default. * As a bonus suppressible via the context menu or something on a case-by-case basis. When the problem is frame this way, the language of the text track doesn't need to be specified at all. In case #1 it is "same as audio". In case #2 it is "same as context site". This makes the text track selection mechanism super-simple. Note that #2 isn't an accessibility feature but addressing #2 right away avoids the abuse of the #1 feature which is for accessibility. > I think we need to understand exactly what we expect from the caption > tracks before being able to suggest an optimal solution. If e.g. we > want caption tracks with hyperlinks on a temporal basis and some more > metadata around that which is machine readable, then an extension of > CMML would make the most sense. I would prefer Unicode data over bitmaps in order to allow captioning to be mined by search engines without OCR. In terms of defining the problem space and metadata modeling, I think we should aim for the two cases I outlined above instead of trying to cover more ground up front. Personally, I'd be fine with a format with these features: * Metadata flag that tells if the text track is captioning for the deaf or translation subtitles. * Sequence of plain-text Unicode strings (incl. forced line breaks and bidi marks) with the following data: - Time code when the string appears. - Time code when the string disappears. - Flag for positioning the string at the top of the frame instead of bottom. * A way to do italics (or other emphasis for scripts for which italics is not applicable), but I think this feature isn't essential. * A guideline for estimating the amount of text appropriate to be shown at one time and a matching rendering guideline for UAs. (This guideline should result in an amount of text that agrees with current TV best practices.) It would be up to the UA to render the text at the bottom of the video frame in white sans-serif with black outline. I think it would be inappropriate to put hyperlinks in captioning for the deaf because it would venture outside the space of accessibility and effectively hide some links for the non-deaf audience. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/
Received on Monday, 8 October 2007 02:22:43 UTC