RE: a11y TF CfC on resolution to support "Media Text Associations" change proposal for HTML issue 9 from Sean Hayes on 2010-04-03 (public-html-a11y@w3.org from April 2010)

From: Sean Hayes <Sean.Hayes@microsoft.com>
Date: Sat, 3 Apr 2010 13:20:03 +0000
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
CC: "public-html-a11y@w3.org" <public-html-a11y@w3.org>
Message-ID: <8DEFC0D8B72E054E97DC307774FE4B911A47A1EF@DB3EX14MBXC301.europe.corp.microsoft.c>
OK, I'll leave discussion of topics not directly related to the media associations out for the time being.

On inherited style, I'll think about it some more. My initial thought was that this should work, but then on reflection I thought it would be cleaner if they were kept separate, but it may be possible to make it work; I'll see if want to make any changes by the f2f, but don't let me hold it up. If necessary I'll post comments later.

#6 there has to be some kind of source, even if it is specified through a @src.  The concept needs to be generic, rather than specific to what element or attribute defined it, so I'm OK with that aspect of your proposed text. The semi open interval semantics is intrinsic to TTML semantics, so I'm not sure it's necessary to state that here, as it's not clear what interval you are talking about.  Is this something to do with media fragment URI's?

" I think all we need to specify is that no matter what the situation, if a frame of the video is displayed which has an associated active text interval, the text of that text interval needs to be visible. I'm happy for that addition."
-- agreed.

" This was also addressed in the text that you refer to in [#1]: The default rendering area is a <div>-like area on top of the video or above the audio controls. We can tighten up this specification if you prefer."
Yes I think this is necessary. The UA may have added bars if the aspect ratio of the div is different to the video, but generally captions will be authored with the assumption they will appear with respect to the "safe region" of video,  If this region is altered it will throw off measures in the TTML which might be used to place text with respect to elements in the video. If the captions are authored to appear on black bars that are in the actual video, then that's OK and up to the media author. That would still count as active video pixels, what I'm looking for is something that excludes padding added by the UA.


"I am not sure how to pick the default height for the default rendering area for audio elements, though. Maybe there is a default from TV that could be re-used."
Digital radio might be a better place to look, as TV pretty much always assumes a picture, even if it's a still. 

-----Original Message-----
From: Silvia Pfeiffer [mailto:silviapfeiffer1@gmail.com] 
Sent: Saturday, April 03, 2010 5:07 AM
To: Sean Hayes
Cc: public-html-a11y@w3.org
Subject: Re: a11y TF CfC on resolution to support "Media Text Associations" change proposal for HTML issue 9

Hi Sean,

Thanks for your feedback. Note that when we get to the HTML WG there will most certainly be more feedback/corrections, so it will be worthwhile following the discussion about the media proposals there, too.

I will try and address your concerns / suggestions. Please note that some fall outside the scope of the Media Associations proposal and thus need to be addressed differently. Others I believe are perfectly good additions. Let me elaborate inline.

On Fri, Apr 2, 2010 at 8:03 AM, Sean Hayes <Sean.Hayes@microsoft.com> wrote:
> I have the following issues and additions
>
> [#1]
> I don't agree with the following section:
>
> <quote>
> This element provides a <div>-like area on top of the video above the controls or for the audio element above the audio controls, into which the text of the external resources is rendered.
>
> Depending on the role, the default styling of the <div>-like area will be different:
>
> caption, subtitle, lyrics, karaoke:
>  color: white;
>  background-color: #333333;
>  opacity:0.8;
>  text-align: center;
>  bottom: 0;
>  position:absolute;
> textual audio descriptions:
>  visibility: hidden; (unless this makes screen readers not read them 
> out)
>  aria-live: assertive;
>  position: absolute;
>  z-index: -100; (or more - shouldn't be visible)
>
> </quote>
> Rationale:
> Any styling of caption content should match that of the video content, and not the player chrome defined by HTML. Thus such styling should be internal to the caption format (as in TTML). Initially I thought it would be appropriate to pass style down into the caption format, but I now think this would be problematic from a practical standpoint, as well as inappropriate.

SP:
You are partially right with this. Styling of captions can match that of the video content. Here, the assumption is that if no styling is given through either the caption file or the Web page author, then the browser has to be given a baseline styling. This is, for example, the case with srt, as you have noticed in [#5]. So, this exists to address your point [#5]. It can easily be overriden with styling from inside the caption file, or styling created by the Web page author.

Thus, the logical interpretation sequence of styling is:
1. this default styling
2. styling given in the caption file
3. styling given by the Web page author

It is probably useful to add this to the wiki page.


> [#2]
> Some indication in the spec that the text in a track should be available to assistive technology when visible (whether through ARIA or some intrinsic means) should be stated.

SP:
Yes, this is a sensible addition.


> [#3]
> A mechanism to associate a transcript is needed to satisfy WCAG, this proposal deals only with timed text, but transcripts are not timed and may include descriptive text as well as caption text (in order to be of use to users who are deaf-blind) , in addition a transcript would be of use in situations where the media is not fetched or played (as an alternative to the whole thing rather than an additional media track), so it may need to be in a track group with the video source. In the case where the web page is authored by the owner of the content, then HTML is capable of defining such a static transcript, so we may not need much additional markup, but we still need a means to label such markup so that assistive technology can follow the association.
>
> Proposal: define a @transcriptFor attribute that<div> and <iframe> can use to point to a media element. Or define a new <transcript> element that can reference a <video> or <audio> element, and whose source can be inline HTML or referenced using a src attribute.

SP:
This is an orthogonal problem to the one addressed by Media Associations. The kind of transcripts that you are talking about are not synchronised with the media element. There are alread technical ways suggested in WCAG in which such transcripts can be provided: e.g.
"G58: Placing a link to the alternative for time-based media immediately next to the non-text content".

Another suggestion has been made in the HTML WG a while back to provide the transcript underneath the media element in a <div> and providing an "aria-described-by" attribute on the media element that links to the <div> http://www.w3.org/WAI/PF/aria/states_and_properties#aria-describedby .

I think your requirement above can be satisfied with these existing methods. Should you disagree, please make a different change proposal, since it is not related to the Media Associations proposal, which concerns itself only with time-aligned text.


> [#4]
> <audio> and text overlays are problematic if the <audio> element has no intrinsic width or height. This is not strictly required by WCAG, but is probably not to be prohibited.
>
> Proposal: Add width and height attributes to <audio> (default to zero)

SP:
I think it is already possible to add width and height attributes onto the <audio> element - if not directly, then definitely through CSS. If not, this may be something to discuss with the HTML WG. However, it is not relevant to the Media Associations proposal discussed here and would also require a different proposal.


> [#5] If we are going to use SRT, then we should define a standard style sheet for it. This is probably best done by defining a mapping into TTML.
>
> Proposal: I can define such a translation.

SP:
Already done - it's what your [#1] concern was about, see above. Feel free to propose different default styles if you disagree with the ones above.


> [#6] It should be made clear that the <track> element is slaved to the timeline of the currently active <source> element and that no synchronisation between track elements is implied.
>
> Proposal change:  "The text is displayed as the parent audio or video element goes through its time interval, i.e. the parent's currentTime has reached the start time of the interval but has not yet reached the end time of the interval (a semi-open interval: [start,end) ). "
>
> To
>
> "The text track is synchronised to the parent's active <source> audio or video media, tracks are not synchronised to each other. The text displayed is defined by the semantics of the referenced text format".

SP:
In principle, I am happy to expose more explicitly that the text track is synchronised to the active media resource - however, we cannot require it to be synchronised to the active <source>, since such an element may not exist and the active media resource may be defined in the "@src" attribute of <audio> or <video>.

Also, I don't think we need to explicitly state that tracks are not synchrnoized to each other - that text tracks should run in sync is a transitive truth and stating that they are not in sync will lead to confusion.

It is, however, important to explain that the time synchrnonisation for timed text elements is based on semi-open intervals, i.e. they start at the start time and disappear at the end time.

Thus, may I propose the following change, which should address your concern:

The text track is synchrnoised to the parent audio or video element's active resource's timeline, which is the only relevant timeline. The text in a text interval is displayed while the active resource's currentTime is between the start time of the interval but has not yet reached the end time of the interval (a semi-open interval:
[start,end) ). "


> [#6]
> Paused behaviour. The HTML spec says: " When a video element is paused and the current playback position is the first frame of video, the element represents either the frame of video corresponding to the current playback position or the poster frame, at the discretion of the user agent. "
>
> The timeline when paused will be at 0, thus any text media defined to be displayed at that time should be displayed, but not if the poster frame is displayed.

SP:
No, this is not true. The timeline pauses at the given offset, it does not pause at time 0. Also note that <audio> and <video> elements don't actually have to start at 0 - they could start at a time offset. The spec extract above avoids using "0" for this particular reason.

The above pause situation only refers to the video being paused at the first frame of the video. I do not think we have to worry about what browser vendors implement for displaying text during that first frame.
If they decide to display the poster frame, text overlay doesn't make sense. If they decide to display the video frame corresponding to the current playback position, then it would make sense to display the text overlay.

I think all we need to specify is that no matter what the situation, if a frame of the video is displayed which has an associated active text interval, the text of that text interval needs to be visible. I'm happy for that addition.


> [#7]
> Need to specify the TTML default rendering area, this can be adapted from the language used for <video>, the default should be the active video pixels (not including any bars added by the UA) however we may need to allow authorial control , e.g. for cases where the text is to be rendered outside of the video rectangle. Some consideration will be needed for the <audio> case.

SP:
This was also addressed in the text that you refer to in [#1]: The default rendering area is a <div>-like area on top of the video or above the audio controls. We can tighten up this specification if you prefer.

I'm not sure though if specifying that it should be above the "active video pixels (not including any bars added by the UA)" is the best way
- having the text on top of black bars seems more useful than having the text obstruct video pixels. I believe the full display area of the video would be the default rendering area for the video.

I am not sure how to pick the default height for the default rendering area for audio elements, though. Maybe there is a default from TV that could be re-used.

Also, the Web page author should indeed have the possibility to change the size of this default display area, which I would think possible through CSS. We might want to drop a note about this in the wiki page.


Best Regards,
Silvia.
Received on Saturday, 3 April 2010 13:21:18 UTC