Re: a11y TF CfC on resolution to support "Media Text Associations" change proposal for HTML issue 9

Hi Sean,

Thanks for your feedback. Note that when we get to the HTML WG there
will most certainly be more feedback/corrections, so it will be
worthwhile following the discussion about the media proposals there,
too.

I will try and address your concerns / suggestions. Please note that
some fall outside the scope of the Media Associations proposal and
thus need to be addressed differently. Others I believe are perfectly
good additions. Let me elaborate inline.

On Fri, Apr 2, 2010 at 8:03 AM, Sean Hayes <Sean.Hayes@microsoft.com> wrote:
> I have the following issues and additions
>
> [#1]
> I don't agree with the following section:
>
> <quote>
> This element provides a <div>-like area on top of the video above the controls or for the audio element above the audio controls, into which the text of the external resources is rendered.
>
> Depending on the role, the default styling of the <div>-like area will be different:
>
> caption, subtitle, lyrics, karaoke:
>  color: white;
>  background-color: #333333;
>  opacity:0.8;
>  text-align: center;
>  bottom: 0;
>  position:absolute;
> textual audio descriptions:
>  visibility: hidden; (unless this makes screen readers not read them out)
>  aria-live: assertive;
>  position: absolute;
>  z-index: -100; (or more - shouldn't be visible)
>
> </quote>
> Rationale:
> Any styling of caption content should match that of the video content, and not the player chrome defined by HTML. Thus such styling should be internal to the caption format (as in TTML). Initially I thought it would be appropriate to pass style down into the caption format, but I now think this would be problematic from a practical standpoint, as well as inappropriate.

SP:
You are partially right with this. Styling of captions can match that
of the video content. Here, the assumption is that if no styling is
given through either the caption file or the Web page author, then the
browser has to be given a baseline styling. This is, for example, the
case with srt, as you have noticed in [#5]. So, this exists to address
your point [#5]. It can easily be overriden with styling from inside
the caption file, or styling created by the Web page author.

Thus, the logical interpretation sequence of styling is:
1. this default styling
2. styling given in the caption file
3. styling given by the Web page author

It is probably useful to add this to the wiki page.


> [#2]
> Some indication in the spec that the text in a track should be available to assistive technology when visible (whether through ARIA or some intrinsic means) should be stated.

SP:
Yes, this is a sensible addition.


> [#3]
> A mechanism to associate a transcript is needed to satisfy WCAG, this proposal deals only with timed text, but transcripts are not timed and may include descriptive text as well as caption text (in order to be of use to users who are deaf-blind) , in addition a transcript would be of use in situations where the media is not fetched or played (as an alternative to the whole thing rather than an additional media track), so it may need to be in a track group with the video source. In the case where the web page is authored by the owner of the content, then HTML is capable of defining such a static transcript, so we may not need much additional markup, but we still need a means to label such markup so that assistive technology can follow the association.
>
> Proposal: define a @transcriptFor attribute that<div> and <iframe> can use to point to a media element. Or define a new <transcript> element that can reference a <video> or <audio> element, and whose source can be inline HTML or referenced using a src attribute.

SP:
This is an orthogonal problem to the one addressed by Media
Associations. The kind of transcripts that you are talking about are
not synchronised with the media element. There are alread technical
ways suggested in WCAG in which such transcripts can be provided: e.g.
"G58: Placing a link to the alternative for time-based media
immediately next to the non-text content".

Another suggestion has been made in the HTML WG a while back to
provide the transcript underneath the media element in a <div> and
providing an "aria-described-by" attribute on the media element that
links to the <div>
http://www.w3.org/WAI/PF/aria/states_and_properties#aria-describedby .

I think your requirement above can be satisfied with these existing
methods. Should you disagree, please make a different change proposal,
since it is not related to the Media Associations proposal, which
concerns itself only with time-aligned text.


> [#4]
> <audio> and text overlays are problematic if the <audio> element has no intrinsic width or height. This is not strictly required by WCAG, but is probably not to be prohibited.
>
> Proposal: Add width and height attributes to <audio> (default to zero)

SP:
I think it is already possible to add width and height attributes onto
the <audio> element - if not directly, then definitely through CSS. If
not, this may be something to discuss with the HTML WG. However, it is
not relevant to the Media Associations proposal discussed here and
would also require a different proposal.


> [#5] If we are going to use SRT, then we should define a standard style sheet for it. This is probably best done by defining a mapping into TTML.
>
> Proposal: I can define such a translation.

SP:
Already done - it's what your [#1] concern was about, see above. Feel
free to propose different default styles if you disagree with the ones
above.


> [#6] It should be made clear that the <track> element is slaved to the timeline of the currently active <source> element and that no synchronisation between track elements is implied.
>
> Proposal change:  "The text is displayed as the parent audio or video element goes through its time interval, i.e. the parent's currentTime has reached the start time of the interval but has not yet reached the end time of the interval (a semi-open interval: [start,end) ). "
>
> To
>
> "The text track is synchronised to the parent's active <source> audio or video media, tracks are not synchronised to each other. The text displayed is defined by the semantics of the referenced text format".

SP:
In principle, I am happy to expose more explicitly that the text track
is synchronised to the active media resource - however, we cannot
require it to be synchronised to the active <source>, since such an
element may not exist and the active media resource may be defined in
the "@src" attribute of <audio> or <video>.

Also, I don't think we need to explicitly state that tracks are not
synchrnoized to each other - that text tracks should run in sync is a
transitive truth and stating that they are not in sync will lead to
confusion.

It is, however, important to explain that the time synchrnonisation
for timed text elements is based on semi-open intervals, i.e. they
start at the start time and disappear at the end time.

Thus, may I propose the following change, which should address your concern:

The text track is synchrnoised to the parent audio or video element's
active resource's timeline, which is the only relevant timeline. The
text in a text interval is displayed while the active resource's
currentTime is between the start time of the interval but has not yet
reached the end time of the interval (a semi-open interval:
[start,end) ). "


> [#6]
> Paused behaviour. The HTML spec says: " When a video element is paused and the current playback position is the first frame of video, the element represents either the frame of video corresponding to the current playback position or the poster frame, at the discretion of the user agent. "
>
> The timeline when paused will be at 0, thus any text media defined to be displayed at that time should be displayed, but not if the poster frame is displayed.

SP:
No, this is not true. The timeline pauses at the given offset, it does
not pause at time 0. Also note that <audio> and <video> elements don't
actually have to start at 0 - they could start at a time offset. The
spec extract above avoids using "0" for this particular reason.

The above pause situation only refers to the video being paused at the
first frame of the video. I do not think we have to worry about what
browser vendors implement for displaying text during that first frame.
If they decide to display the poster frame, text overlay doesn't make
sense. If they decide to display the video frame corresponding to the
current playback position, then it would make sense to display the
text overlay.

I think all we need to specify is that no matter what the situation,
if a frame of the video is displayed which has an associated active
text interval, the text of that text interval needs to be visible. I'm
happy for that addition.


> [#7]
> Need to specify the TTML default rendering area, this can be adapted from the language used for <video>, the default should be the active video pixels (not including any bars added by the UA) however we may need to allow authorial control , e.g. for cases where the text is to be rendered outside of the video rectangle. Some consideration will be needed for the <audio> case.

SP:
This was also addressed in the text that you refer to in [#1]: The
default rendering area is a <div>-like area on top of the video or
above the audio controls. We can tighten up this specification if you
prefer.

I'm not sure though if specifying that it should be above the "active
video pixels (not including any bars added by the UA)" is the best way
- having the text on top of black bars seems more useful than having
the text obstruct video pixels. I believe the full display area of the
video would be the default rendering area for the video.

I am not sure how to pick the default height for the default rendering
area for audio elements, though. Maybe there is a default from TV that
could be re-used.

Also, the Web page author should indeed have the possibility to change
the size of this default display area, which I would think possible
through CSS. We might want to drop a note about this in the wiki page.


Best Regards,
Silvia.

Received on Saturday, 3 April 2010 04:07:35 UTC