RE: Moving longdesc forward: Recap, updates, consensus from John Foliot on 2011-05-07 (public-html-a11y@w3.org from May 2011)

From: John Foliot <jfoliot@stanford.edu>
Date: Fri, 6 May 2011 18:36:00 -0700 (PDT)
To: "'Benjamin Hawkes-Lewis'" <bhawkeslewis@googlemail.com>, "'Gregory J. Rosmaita'" <oedipus@hicom.net>
Cc: "'Laura Carlson'" <laura.lee.carlson@gmail.com>, "'Charles McCathieNevile'" <chaals@opera.com>, "'HTML Accessibility Task Force'" <public-html-a11y@w3.org>
Message-ID: <012901cc0c57$1ea85c60$5bf91520$@edu>
Benjamin Hawkes-Lewis wrote:
>
>
> Nobody disputes that users needs text alternatives, and HTML5 must
> provide mechanisms for authors to provide them.
>
> The hidden metadata objection simply says that encouraging authors to
> hide
> such text alternatives makes text alternatives more likely to be poor
> quality.

Ben,

The history of @longdesc goes back to a time when designers (not engineers) 
expressed a need to provide the functionality that non-sighted users 
required, without it impacting on the visual design of their page/content. 
Nobody has addressed this request/requirement, and nobody has suggested a 
better means of providing for this scenario.

You get no philosophical argument from me that keeping everything in the 
clear, in plain and readable text, is by far the most accessible means of 
conveying document information. However, you would be doing everyone a 
disservice if you did not admit that (especially in the early days of web 
accessibility) one of the most common complaints was that creating 
accessible web content needed to be "plain and boring". In the late '90's 
(http://ncam.wgbh.org/webaccess/symbolwinner.html) we had the "D" link and 
it was almost universally rejected as being a horrid idea 
(http://webaim.org/techniques/images/longdesc#d). Designers 'insisted' that 
HTML could do better than that.

If engineers and designers want to provide longer textual descriptions in 
page with their complex image today, then I will both salute them and thank 
them. However, we have use cases and first hand testimonial from some 
designers who patently reject having to do this for design aesthetics, and 
we require a means to address this need as well. Hacking away with ARIA and 
hidden <div>s is a retrograde and expensive[*] 'solution' to what is 
actually a fairly elegant if specialized option. WYSIWYG tools have made 
creating and ensuring longdesc content remains current and properly 
associated to their images a simple - even simplistic - task, and continuing 
to suggest that broken hand-coded links and bogus values for @longdesc will 
continue to wildly propagate on the web has no foundation in fact.

[* in an era where slicing even 10 bytes from a web-page equals hundreds of 
thousands of dollars a year for companies such as Google, transporting all 
that off-screen text "just in case" will meet with even more resistance than 
providing a user-option to follow a link to get that extra text]

>
> Here's Tantek's objection from the poll:
>
> "[@longdesc] is one of the worst forms of invisible metadata or "dark
> data"
> which are known to rot and become inaccurate over time (see: meta
> keywords, RDF
> in comments, sidefiles, etc.)."

Links rot; this is a known problem. I can point to literally hundreds of web 
pages here where I work where those pages have fallen into disuse, and their 
links no longer return good data. (For fun, here's one that has 5 broken "in 
the clear" links: http://infolab.stanford.edu/~sergey/ - enjoy the java 
applet)

(Here's another for you: http://tantek.com/CSS/19980106.html - check the 
link to HyperCard... BTW, Tantek is also a friend, and I don't mean to pick 
on him.)

Link rot is a fact of the web, whether or not the links are referenced via 
@longdesc or via @src.


>
> Here's Lachlan's objection from poll:
>
> "There are arguments based on the idea that hidden metadata is useful
> for some
> unspecified use case, with the supposed problem that any additional
> information
> that may be referenced by longdesc, must be hidden from the vast
> majority of
> users, and not otherwise affect the design of the page.

Laura has now admirably countered this claim with multiple use cases.


>
> "Such arguments are misguided, because it is far better for
> accessibility for
> such issues to be considered a prominent part of the page's design
> and/or user
> experience from the outset, and for the accessible content to be
> treated as a
> first class citizen of the site.

With no disrespect to Lachlan, many, many working accessibility 
specialists - who's job each day is to work with people with disabilities 
and create content for those users - disagree.  I respect Lachlan's right to 
hold an opinion, but I will also echo back a phrase that has commonly been 
given to the accessibility community: that HTML5 should not be based simply 
upon the word of an expert. Yet here Lachlan is sounding very "expertly": he 
offers no proof of his assertions short of his version of "logic", and he 
lacks any real credentials to support the claim that he is an accessibility 
expert. I know Lachlan, I know Lachlan cares, and he wants the web to be 
accessible, but wanting and espousing a person philosophy are 2 different 
things.


> Hiding it behind the longdesc
> attribute, or
> any other similar method, effectively treats it as a second or even
> third class
> citizen which has been clearly demonstrated to result in suboptimal
> alternative
> content that never actually helps those it's intended to."

There has been no clear demonstration of *anything* - we have a 4 year old 
blog posting by Mark Pilgrim (http://blog.whatwg.org/the-longdesc-lottery) 
based upon 'confidential' Google data that no-one can view or challenge. 
That's not proof, that's a manifesto.

I will happily concede that in 1999 - 2004 many authors were unaware of how 
to use @longdesc properly, and likely many documents that Ian crawled from 
that time period returned poor results. So what?  I can also point to likely 
just as many documents from that time-frame that had nested tables 14 levels 
deep, because authors back then didn't know any better about that either. 
What does that prove? It proves that author awareness and best practices 
have improved considerably over the past 5 years. I am extremely confident 
that after all of the discussion around @longdesc and it's place in HTML5 
that has been generated, that lack of awareness on how to do it right will 
be a thing of the past. Moving forward, "bad" longdesc will be symptomatic 
of one of 2 things: apathy or lack of education. One we can fix, the other, 
not so much.


> As such, I would expect a "hidden metadata fallacy" section to put
> forward arguments in favour of any of the following:
>
>   * Errors in visible data and hidden metadata are equally likely to be
>     corrected. (This seems obviously false.)

Obvious how? And what kind of "errors" are you talking about?


>
>   * In the special case of long text alternatives, authoring visible
> data is likely to compromise quality more than hiding it because of the
> risk that authors write captions that assume you can see and make sense of
> the image.

Captions are not long descriptions.

@longdesc takes a URL (which can be mechanically checked for "link rot"), 
and the value of the page at the other end is textual - an html file usually 
with little-to-no CSS styling, so reading it is quite simple. Does it take 
an extra author step to verify that the textual description matches the 
image it is referenced by? Yes. That doesn't make it invisible, it only 
means that it might be subject to apathy.


>
>   * More authors will provide text alternatives if they can hide them,
> and this
>     is worth the accuracy cost of helping them to do so.

I believe that Laura's use cases, and the quote from Kyle Weems (CSSquirrel) 
addresses this point. Are you suggesting we need more "voices" making this 
assertion?


>
>   * User agents could excude bad data.

Huh??


>
>   * Users will not give up on @longdesc if they repeatedly encounter
> bad @longdesc values.

This cannot be proven one way or the other: this requires a crystal ball 
that no one has.

What we can suggest however is that awareness of longdesc, and the value it 
can offer users, is reaching an increased awareness, and with that awareness 
we will likely see both increased authoring and consumption. 
(http://webaim.org/projects/screenreadersurvey3/#longdesc)


>
>   * User agents will not stop implementing @longdesc if users
> repeatedly encounter bad @longdesc values.

User agents don't "implement" @longdesc, authors do. User Agents support 
@longdesc; at least some do, some don't. I am not sure what you are saying 
here.


>
>   * Better implementations will make it easier to discover @longdesc
> visually, reducing the error rate due to it being hidden from the normal
> rendering of the page itself.

Better GUI User Agent support for discoverability will likely have this 
effect. This can and is asserted, but cannot be proven - it is a future and 
forward-looking statement.


>
> But instead, the "Hidden Metadata Fallacy" section lays seige to a
> nonsenical strawman argument asserting that hidden metadata is not
> discoverable at all.

What would you write then Ben? If you *do* agree that the Hidden Metadata 
argument is a strawman, how would you better address that argument? The 
overall problem with strawman arguments is that there is no proof one way or 
the other, only assertions and an application of one's personal logic. 
However this Working Group wants to ensure we can do the best we can do, so 
I think we are open for input.

If you accept that addressing this fallacious strawman argument needs to be 
addressed either directly in the Change Proposal, or linked from the Change 
Proposal, what would you write instead?


>
> > -- one could call "hidden" any metadata attached to an image (such as
> a JPEG
> > file) which contains info about the camera, the resolution, and other
> > technical metadata because when the image is rendered, such metadata
> is not
> > immediately available to the user unless that metadata is extracted
> or
> > exposed using a specialized tool -- this is the entire point of such
> > initiatives as the RDF in Photo work within the W3C and some of the
> other
> > initiatives that laura has documented...
>
> In the context of a webpage, those are all hidden metadata.

...except...

Examining Gregory's statement, JPEG files have EXIF data associated to them: 
EXIF being *true* metadata in the traditional sense. Under most 
circumstances, the EXIF data associated to a JPEG file is not shown to the 
end user on the screen, because under most circumstances users don't care 
about that data.

But sometimes they do - Flickr being one such case: 
http://www.flickr.com/photos/benward/2109344239/
(This photo was taken on December 12, 2007 in Holborn, London, England, GB, 
using a Canon Digital IXUS 65.)

That last bit of data (date, location, camera), that was exposed in the 
clear on the page I linked to, and is clearly not "hidden". It was 
discoverable.

Further, I can extract even more data using a specialized tool:
* JFIF Version 1.01
* Resolution 72 pixels/inch
* File Type JPEG
* MIME Type image/jpeg
* Comment AppleMark
* Encoding Process Baseline DCT, Huffman coding
* Bits Per Sample 8
* Color Components 3
* File Size 90 kB
* Image Size 480 × 640
* Y Cb Cr Sub Sampling YCbCr4:4:4 (1 1)

(source: 
http://regex.info/exif.cgi?dummy=on&imgurl=http%3A%2F%2Ffarm3.static.flickr.com%2F2369%2F2109344239_f2d139ce29_z.jpg) 
where again, that was all data "in the clear".

The problem with @longdesc today is not the attribute, it's the browsers' 
failure to make it discoverable to end users.


JF
Received on Saturday, 7 May 2011 01:36:30 UTC