Re: Change Proposal for Issue 194 from Silvia Pfeiffer on 2012-05-23 (public-html-a11y@w3.org from May 2012)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Wed, 23 May 2012 16:53:14 +1000
To: Charles McCathieNevile <chaals@opera.com>, HTML Accessibility Task Force <public-html-a11y@w3.org>, "Edward O'Connor" <eoconnor@apple.com>, Geoff Freed <geoff_freed@wgbh.org>, Laura Carlson <laura.lee.carlson@gmail.com>
Message-ID: <CAHp8n2nnEEeDk4fHUuQGZG9ioOM3HtQtSo_auKX7kQgdV29J_A@mail.gmail.com>
Some replies to all the comments received.

On Tue, May 22, 2012 at 7:58 PM, Charles McCathieNevile
<chaals@opera.com> wrote:
>
> First up: This meets the "I can live with it" test for me.

That is good to hear. John has given you some feedback already, so I
will just add where I have some further information.


> What happens if the transcript element is
> placed inside the video element? Does it still need the transcript
> attribute? Or does it *have* to be placed outside the element?

Putting anything inside the video element that is not a <track> or
<source> element turns it into fallback content for browsers that do
not support the video element. Thus, if the browser supports <video>,
the <transcript> will not be rendered. If the browser doesn't support
<video>, the <transcript> will be rendered in its place (though given
that such a browser likely doesn't understand the <transcript> element
either, the content of the <transcript> element will be rendered).


> IMHO (informed by people who would have to do this a lot) separating the two
> pieces and then having to put them together is something of an anti-pattern
> for authoring and maintenance, although for the sake of handling links to
> transcripts without duplication I think it is an acceptable compromise in
> this situation.

Mainly I am concerned about being able to have a single solution for
any type of transcript, no matter whether rendered on page or
off-page. It's easier to learn and code up and automatically process.
I can understand that trying to optimize for the link-only case (UC1)
would lead us preferably to a list of links inside the video element.
However,  I believe it provides only the illusion of being simpler and
avoiding duplication: if I want a link both in the video player and on
page, then you have to duplicate that link. In the presented CP that
duplication does not happen.


> In the guidance about using the element to wrap links, it needs to be clear
> what happens if these are added for backwards compatibility and then hidden
> for design aesthetics. Experience suggests that people do that with anything
> they *think* is just for accessibility, and that they do it in ways that
> range from not very good to completely broken.

Agree with JF here that this should be authoring guidance and not
HTML5 spec text. We could recommend there that hidden <transcript>
element, if presented with a media Element on the same page, need to
have a different visual representation on the page, preferably in the
video controls.


> There is a URL API being developed in the Web Apps group - messy editor's
> draft at http://dvcs.w3.org/hg/url/raw-file/tip/Overview.html but given this
> is relatively simple it should be finished before HTML 5. That might be
> better than rolling your own.

Interesting. I had no intention to roll our own URL API - I simply
copied the part of the <a> element IDL that is relevant for
<transcript>.


> Thanks for doing the work on pulling together, and thanks to everyone who
> worked to get a reasonable agreement.

Thanks for your input!.


On Tue, May 22, 2012 at 9:54 PM, Laura Carlson
<laura.lee.carlson@gmail.com> wrote:
> It also claims to be better than aria-describedAt too. Please remove
> that bullet point, it isn't needed and may cause objections.

John has kindly already removed it. I'm happy with that.


On Tue, May 22, 2012 at 9:01 PM, Geoff Freed <geoff_freed@wgbh.org> wrote:
>
> One small point:  regarding interactive transcripts, I wouldn't classify them as either common or well-understood.  They aren't used *that* widely, nor are they particularly well-liked  (anecdotally speaking) in the caption-viewing community.  The motion that is often inherent in an interactive transcript-- either a moving box/highlighting region or scrolling text-- can be distracting and difficult to read when trying to watch a video.

I see it mostly as a solution for blind users and not for deaf users.

I can understand that they are less useful than captions to deaf and
HoH users, since they may be visually distracting. That is an
indication that browser default rendering should maybe provide a means
to hide the interactive transcript.

However, I can see the interactive transcript being extremely useful
to blind and VI users: they can scan through the text in the
transcript and click to activate video playback at a point in time
that they are interested to actually listen to the video/audio file.
This is similar to chapter markers, except that the full text
transcript is being used to scan through the video rather than some
(typically scant) chapter markers.


On Wed, May 23, 2012 at 2:11 AM, John Foliot <john@foliot.ca> wrote:
>
> Some notes:
>
> 1) "Satisfying [UC1] linked transcripts"
>
> "Web browsers can render a menu on top of the video with the links off-page
> and the first line of text inside the transcript as a "label". (Or should we
> use an explicit label/legend element to provide this?)"
>
>
> If I can find 1 significant fault with this is that a) I don't recall
> discussing this, and b) we discussed (and I was quite vehement) that the
> "controlling switch" be rendered as such by the UI/User Agent as part of the
> video controls.
>
> From my [private] recap email that I forwarded to you, Janina and Chaals:
>
>        "I use [button] to mean an interactive control/switch which may
> change from device to device and platform to platform, but will always be a
> native browser control."
> (It was my understanding that we had consensus there)
>
> In this fashion, the semantics of [button]
> (http://www.w3.org/TR/html4/interact/forms.html#edef-BUTTON) and
> (http://www.w3.org/TR/wai-aria/roles#button) are conveyed to the end user
> (the consumption by choice piece), and yes it would have a label
> (aria-label), although at this point it could be "hard-wired" to the UI so
> that the author does not need to include the aria-label attribute.

Amongst this I found the suggestion to use aria-label to provide a
label for the video menu. I like this and I would suggest to add this
to the wiki page.

Now, back to the topic of "MUST display an interactive control":

> From your response to that same earlier email:
>
> JF:
>> So a clear part of the change proposal is to capture this
>> requirement, perhaps along the lines of "User-agents MUST expose the
>> presence of @transcript and the ability to consume the transcript to all
>> users" - I am neither a wordsmith nor a tech writer but you get the idea
>> (and yes, that was RFC 2119 MUST <smile>).
>
> SP:
> There is basic agreement on the <transcript> element and a @transcript
> attribute. I'm not sure we can go as far as having a MUST requirement
> on the controls getting a visual exposure on the video element for
> transcripts, but we can certainly recommend there to be one and we can
> require the transcript availability to be announce by AT.
>
> Silvia, if MUST is too strong, then SHOULD would be acceptable. Having the
> trigger as a [button] aids significantly to the "announced by AT"
> requirement, as well as the "R2: Choice to consume - the option to consume
> or not consume the transcript remains in the control of the user"
> requirement.
>
> Failing to capture this more directly in this CP is a significant flaw and
> (potential) deal-breaker (for me). Can we discuss how to better capture this
> intent/requirment in the CP?

The HTML5 spec does not make any MUST statement on how browsers need
to render user interfaces. Even the video controls only list a
recommended set of buttons and not a definitive list. This is because
there may be different mechanisms necessary on different devices and
there is design competition allowed between browsers. The introduction
of a transcript button is no different in this respect, which is why I
wrote "can render a menu" rather than "MUST". I would go as far as
adding transcript links to the list of buttons to be rendered in
browsers when controls are present.

It's here: http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#attr-media-controls

"If the attribute is present, or if scripting is disabled for the
media element, then the user agent should expose a user interface to
the user. This user interface should include features to begin
playback, pause playback, seek to an arbitrary position in the content
(if the content supports arbitrary seeking), change the volume, change
the display of closed captions or embedded sign-language tracks"

I'd be happy to suggest adding "or transcripts" right there.

Is that sufficient for you?


> *****************
>
> 2) "Satisfying [UC3] interactive transcripts" & "Satisfying [UC4]
> transcript-only pages"
>
> <video transcript="t1" src=video.mp4></video>
> <transcript id=t1 width:200 height: 200>
>  <track src=transcript1.vtt srclang=en default>
>  <track src=transcript2.vtt srclang=de>
> </transcript>
>
>
> I'm not sure if this was simply a fat-finger issue, but I have concerns
> about the sizing syntax you are proposing here. If it is the 'old-school'
> HTML 4 sizing construct it should be:
>
>        <transcript id=t1 width="200" height="200">
>
> Conversely, if we do this in the CSS fashion, it should be:
>
>        <transcript id=t1 style="width:200; height:200;">
>
> ...or:
>
> transcript.t1 {
>   width:200;
>   height:200;
>  }
>
> I suspect that either should be good, but our code example(s) should reflect
> either (or both) - what I see is something new again (unless something else
> has slipped my attention completely). I've not made any edits (yet) but
> happy to do so.

I think you've fixed that. I'm happy with it.


> *****************
>
> 3) JF Edits to the Wiki:
>
> * Turned the "Satisfying" sections into level 3 headings for readability and
> improved (accessible) structural markup
> * Minor grammar edits (some spelling and some phrasing)
> * Added 2 examples of interactive transcripts for Section "Satisfying [UC3]
> interactive transcripts" (3PlayMedia and YouTube)
> * Removed "Is a better solution for long text alternatives than @longdesc or
> @aria-describedAt, since it also solves the interactive transcript need."
> Per comments/feedback from Chaals & Laura (I concur BTW)

I'm happy with these.


> *****************
>
> 4) Nits:
>
> a) "Author: Silvia Pfeiffer (Google) & the HTML Accessibility Task Force"
>
> I think that the contributions of Chaals, Janina and myself should be
> highlighted more, as at this time it was not the a11yTF that contributed
> here, but rather we 4 through close to 6 hours of teleconference. In
> particular, in the User Requirements section, you quoted from my recap email
> to Chaals almost verbatim <wink> ("Just a note for Charles that I am due to
> adapt my CP over the weekend with much of the below.") It also supports the
> final statements that I will be withdrawing my other Change Proposals to
> support this CP.  If we can get the [button] issue worked out I will
> withdraw those CPs this week, in advance of the May 24th date.

I had not had any feedback on the wiki page before emailing the list,
so didn't want to falsely represent individual that may still object.
I'm happy to enumerate everyone who contributed explicitly.


On Wed, May 23, 2012 at 6:41 AM, Edward O'Connor <eoconnor@apple.com> wrote:
>
> Overall there's much that I like in this proposal, though I'm
> unconvinced that minting a <transcript> element is needed or desirable.

A <transcript> element is indeed not explicitly necessary. Everything
could be done just with IDREFs to existing elements, except if we
wanted to allow introduction of a transcript from a WebVTT or TTML
file that is browser-rendered. The advantages of using an explicit
element are:

* no matter what format the transcripts take, they can all be found
inside the same element and thus be much easier human- and
machine-discoverable

* there is no confusion between @transcript and @aria-describedAt/@longdesc

* it allows to also link to a transcript that is rendered in an iFrame
from an off-page html page

* it *allows* extension to other formats of transcripts, such as
automated rendering of interactive transcripts.


> It is unnecessary to mint a new element to satisfy UC1 (transcript as
> linked resource); it would be simpler for the IDREFs in transcript="" to
> directly point at the <a> that links to the transcript.

I was told that hidden <a> elements are a real problem, since they
gain keyboard focus. Putting the link into a <div>-like element avoids
this.


> It is also unnecessary to mint a new element to satisfy UC2 (in-page
> transcripts); as in the linked case, it would be simpler for authors to
> directly point at the <article> or <div> that already contains the
> transcript content.[1]

...or a <p> or a <iframe> or a <footer> or anything else really.
That's exactly what I am trying to avoid: transcripts being
semantically hidden by not being explicitly called out on the page.
Remove the video element from the page and you have lost all semantic
meaning of there being a transcript of a video element on the page.

Instead, with an explicit <video> element, you could even just put a
link to a video file for downloading on the page and still put a
<transcript> on the page:

<a href="video.avi">Download your video here</a>
<transcript src=transcript.html>Read the video transcript</transcript>

As a crawler, I can confidently interpret transcript.html as the
transcript - there is no chance of doing so if it's just in a <div>.


> Geoff wrote, on UC3 (interactive transcripts):
>
>> [R]egarding interactive transcripts, I wouldn't classify them as
>> either common or well-understood. They aren't used *that* widely, nor
>> are they particularly well-liked (anecdotally speaking) in the
>> caption-viewing community.
>
> I don't think we should try to address interactive transcripts at this
> time. It would be best to wait and let Web developers experiment with a
> variety of approaches. In the future, we can base a standard approach to
> interactive transcripts on the fruits of such experiments.

I've watched this space for more than 10 years and interactive
transcripts have become quite common - in fact so common that a
Wikipedia page actually defines what they are. Also note that I am
personally receiving about one request a month by a random user that
is asking for this feature and my reply every time is "go and find a
good Web developer" which makes them hang their head and walk away.

Anyway, Web developers have been using interactive transcripts for a
long time. They provide the best usability for blind users, giving
them direct access to video points in time that they would otherwise
not be able to identify this clearly and quickly. A whole suite of
technologies is based on this feature alone: Daisy books. I don't
think you can argue that we don't yet know what they are and need more
experiments.

I can understand that it may not be the first feature that browsers
will want to implement for transcripts. Since the <track> element
parses into a TextTrackCueList, Web developers can just use these to
create polyfills for interactive transcripts for now. But I would like
to see a solution proposed for transcripts that is easily extensible
to interactive transcripts.


> UC4 (unassociated transcript documents) is already addressed by existing
> markup. Since there is no media element with which to programmatically
> associate the transcript, there is no need to indicate in
> machine-readable form that the document is a transcript. Microformats,
> Microdata, or RDFa could be used, should additional semantics be needed.

The downside to using Microdata or RDFa markup is that there is no
explicit registration of what a value semantically means. For example,
@itemprop=transcript is a nice markup - but would Google trust such
markup to be recognized as a video transcript? Or is that value just
something random that a particular Web developer is using for their
meaning of "transcript" (e.g. a student's permanent record)?

I have provided an example above where the <transcript> element is
useful next to a link to a video file download. I think UC4 is a valid
use case.


Regards,
Silvia.
Received on Wednesday, 23 May 2012 06:54:25 UTC