RE: a11y TF CfC on resolution to support "Media Text Associations" change proposal for HTML issue 9 from Sean Hayes on 2010-04-04 (public-html-a11y@w3.org from April 2010)

From: Sean Hayes <Sean.Hayes@microsoft.com>
Date: Sun, 4 Apr 2010 16:49:20 +0000
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
CC: "public-html-a11y@w3.org" <public-html-a11y@w3.org>
Message-ID: <8DEFC0D8B72E054E97DC307774FE4B911A49B5EF@DB3EX14MBXC301.europe.corp.microsoft.c>
For 1) I would suggest we define the semantics of SRT, and any other format that has no formal timing model, in terms of TTML, (not that it has to be implemented that way of course). This will fix the interval issue and probably a whole bunch of other stuff too.
e.g.

1
00:00:20,000 --> 00:00:24,400
In connection with a dramatic increase
in crime in certain neighbourhoods,
 
2 
00:00:24,600 --> 00:00:27,800 
the government is implementing a new policy...

===>

<?xml version="1.0" encoding="utf-8"?>
<tt xml:lang="en"
    xmlns="http://www.w3.org/ns/ttml">
  <body>
      <div>
	<p id="srt1" begin="20s" end="24.4s">In connection with a dramatic increase<br/>in crime in certain neighbourhoods,</p>
 	<p id="srt2" begin="24.6s" end="24.8s">the government is implementing a new policy...</p>
     </div>
  </body>
</tt>

For 2 ". Maybe a CSS attribute such as "letterbox: include/exclude"?."  Making this an author choice is a good idea. But I wouldn't want to punt it to CSS. CSS controls the extent of the div, but this is slightly different.
 Maybe we can have an additional attribute on track:
@extent with values {media, container}

Extent="media" means put the origin of the root rendering area for that track at the top left pixel in the video frame, and absent any information to the contrary in the TTML, makes the extent of it extend to the bottom right pixel in the video frame. The video frame may contain black bars, but these are not the same as bars applied by the UA

Extent="container" means make the root rendering area coincide exactly with the layout div (as you had it before), this would cause the TTML to render over any black bars or padding applied by the UA.

3) DAB has a number of possible text associations, including full web pages, but it seems they haven't thought of captions or subtitles yet.  

Provided we allow that <audio> can have a rendering area, then we can just give it the same default rendering as the <video> element, which is I believe 300x150px.if authors want they can get rid of it by making it 0 in either dimension using CSS, they won't be able to apply captions in that case; but perhaps they will have a transcript.

 Sean.

-----Original Message-----
From: Silvia Pfeiffer [mailto:silviapfeiffer1@gmail.com] 
Sent: Saturday, April 03, 2010 3:24 PM
To: Sean Hayes
Cc: public-html-a11y@w3.org
Subject: Re: a11y TF CfC on resolution to support "Media Text Associations" change proposal for HTML issue 9

Hi Sean,

On Sun, Apr 4, 2010 at 12:20 AM, Sean Hayes <Sean.Hayes@microsoft.com> wrote:
>
> #6 <..> The semi open interval semantics is intrinsic to TTML semantics, so I'm not sure it's necessary to state that here, as it's not clear what interval you are talking about.  Is this something to do with media fragment URI's?

SP:
Since the Media Text Associations proposal is not only referring to TTML, but also to SRT (and may in future refer to other text elements, too), there is a need to specify how to interpret the start and end time of the text intervals. Maybe the term "text interval" is not a good choice - maybe you prefer "text fragment"? Or have an even better proposal for how to identify the individual text segments that have a start and end time?


> " This was also addressed in the text that you refer to in [#1]: The default rendering area is a <div>-like area on top of the video or above the audio controls. We can tighten up this specification if you prefer."
> Yes I think this is necessary. The UA may have added bars if the aspect ratio of the div is different to the video, but generally captions will be authored with the assumption they will appear with respect to the "safe region" of video,  If this region is altered it will throw off measures in the TTML which might be used to place text with respect to elements in the video. If the captions are authored to appear on black bars that are in the actual video, then that's OK and up to the media author. That would still count as active video pixels, what I'm looking for is something that excludes padding added by the UA.

SP:
If we assume that TTML is authored towards a particular display style of the video, then that has to be specified somehow. If it is authored with the assumption of there not being black bars, but the black bars are - when the video is rendered - part of the video display area, then it's not displaying as expected.

Reading the TTML spec, there is reference to an externally defined "root container region", for which the extent can be changed in DFXP (i.e. width and height), but not the root coordinate. It might make sense to do this through CSS somehow - then it would also be applicable to SRT and other formats. Maybe a CSS attribute such as
"letterbox: include/exclude"? Not sure how to solve this, actually.


> "I am not sure how to pick the default height for the default rendering area for audio elements, though. Maybe there is a default from TV that could be re-used."
> Digital radio might be a better place to look, as TV pretty much always assumes a picture, even if it's a still.

I have listened to radio on a TV, but you are of course right. I've tried finding out how text works on DAB and mostly just came across a one-line scrolling text capability (as is being used in car radios).
That probably won't be sufficient for reading captions, so maybe something more like 3-4 lines or whatever is used on TV might be more appropriate. Again, keen on other people's opinions here.

Cheers,
Silvia.
Received on Sunday, 4 April 2010 16:50:00 UTC