Re: Moving longdesc forward: Recap, updates, consensus from Charles McCathieNevile on 2011-05-07 (public-html-a11y@w3.org from May 2011)

From: Charles McCathieNevile <chaals@opera.com>
Date: Sat, 07 May 2011 13:18:17 +0200
To: "Gregory J. Rosmaita" <oedipus@hicom.net>, "Benjamin Hawkes-Lewis" <bhawkeslewis@googlemail.com>
Cc: "Laura Carlson" <laura.lee.carlson@gmail.com>, "HTML Accessibility Task Force" <public-html-a11y@w3.org>
Message-ID: <op.vu3y0rm8wxe0ny@widsith.eng.oslo.osa>
On Fri, 06 May 2011 22:30:23 +0200, Benjamin Hawkes-Lewis  
<bhawkeslewis@googlemail.com> wrote:

> On Fri, May 6, 2011 at 4:37 PM, Gregory J. Rosmaita <oedipus@hicom.net>  
> wrote:
>> no, i am NOT ok with dropping the Hidden Metadata Fallacy -- this is
>> a fundamental philosophical/logical principle: that discoverable  
>> metadata is available for those who NEED it
>
> Nobody disputes that users needs text alternatives, and HTML5 must  
> provide mechanisms for authors to provide them.
>
> The hidden metadata objection simply says that encouraging authors to  
> hide such text alternatives makes text alternatives more likely to be
> poor quality.

I accept that assertion as true. I also accept, as the chairs found, that  
it isn't very important in determining whether it should be possible to  
use hidden metadata...

> Here's Tantek's objection from the poll:
>
> "[@longdesc] is one of the worst forms of invisible metadata or "dark  
> data" which are known to rot and become inaccurate over time (see: meta  
> keywords, RDF in comments, sidefiles, etc.)."

Indeed. Because it is an edge case - while the web is filled with icons,  
pictures of text and the like, the marginal value of longdesc for most of  
the web's images does not justify the cost of providing a substantial  
description (as opposed to functional equivalent like alt, or textual role  
information like title).

However, for those images where the cost is justified, and where longdesc  
is used to provide dark metadata (it can equally be used to link to  
content present in the page, if the author chooses) the cost of "dark  
metadata" is often justified - and it appears that many of the examples  
are in fact relatively high-value material with a relatively large  
investment in maintaining the data systems accuractely. No mechanism  
should force two-bit part-time amateurs to build expensive and complex  
high-value systems to publish a photo of their cat. But the web is used a  
lot by organisations and individuals who do put a high value on their  
content, manage it with a significant investment, and for whom the "dark  
metadata" argument is about as relevant as saving bottle-tops to recycle  
the metal as a source of income is to people like Tantek and Lachlan.

> Here's Lachlan's objection from poll:
>
> "There are arguments based on the idea that hidden metadata is useful  
> for some unspecified use case,

Actually, the use cases are specified.

> with the supposed problem that any additional information that may be
> referenced by longdesc, must be hidden from the vast majority of
> users, and not otherwise affect the design of the page.

This is not a supposed problem. It is an observed fact that some relevant  
proportion of thorough image descriptions will not be included in the  
content of a page.

It is also a well-understood principle of usability and accessibility that  
adding more content to a page can easily *decrease* its accessibility for  
a large number of users.

> "Such arguments are misguided, because


> it is far better for accessibility for such issues to be considered a
> prominent part of the page's design and/or user experience from the  
> outset,

That it is better for accessibility to be considered from the start is  
not, I believe, in dispute. However, ignoring the above-mentioned  
principle about not overloading the user with content means this argument  
is fundamentally flawed - it would claim to justify large amunts of  
content on pages which are readily shown to have reduced accessibility  
because of that content.

> and for the accessible content to be treated as a first class citizen
> of the site.

This argument assumes that first-class citizens are only those who are  
visible. It assumes that there is no management of content, rather than  
that sometimes the management is so minimal that anything not visible is  
forgotten. The reality of the Web is that many resources are managed far  
more carefully (although many are not - there's a lot of rubbish published  
too). In particular longdesc (which is relatively expensive) is used more  
by those who apply more management.

> Hiding it behind the longdesc attribute, or any other similar method,
> effectively treats it as a second or even third class citizen

This is an unsubstantiated allegation about valuation of resources, used  
as a rhetorical device. It has no real bearing on the argument.

> which has been clearly demonstrated to result in suboptimal alternative
> content

I accept this statement, although the question of degree is highly  
relevant in judging the importance of the argument. It is known that  
accessibility of the web is very "sub-optimal" (woeful, on average, is  
probably a reasonable description - it's certainly what Hixie suggested as  
his experience of using JAWS) and yet it is very clearly appreciated by  
and valuable to many people who rely on such accessibility as there is.

> that never actually helps those it's intended to."

This is again a sweeping and unsubstantiated generalisation. It relies on  
the assumption that anything less than perfect is terrible, which is  
logical fallacy and utter nonsense.

> http://www.w3.org/2002/09/wbs/40318/issue-30-objection-poll/results#xfigure
>
> In the Chairs' decision, they accepted that the tendency of invisible  
> metadata to rot could help "explain the bad data that has been seen",
> but treated this as a weak objection because only anecdotal evidence
> was provided that rotten data would cause implementors or users to give
> up on @longdesc and no evidence was provided that user agents could not
> exclude a lot of bad data.
>
> As such, I would expect a "hidden metadata fallacy" section to put
> forward arguments in favour of any of the following:
>
>   * Errors in visible data and hidden metadata are equally likely to be
>     corrected. (This seems obviously false.)

While "equal probability" is false, it is important to recognise that  
where information is managed (good libraries, well-run corporations,  
effective governments, organised and thoughtful individuals,  
community-supervised resources, etc etc) this argument becomes extremely  
weak, and its relevance rests on the fallacy Lachy assumed above that  
anything less than perfect is more harmful than helpful.

>   * In the special case of long text alternatives, authoring visible  
> data is likely to compromise quality more than hiding it because of the
> risk that authors write captions that assume you can see and make sense
> of the image.

I would express the argument in terms of the difficulty of clearly  
describing an image in text that fits the flow of a page in which it  
appears.

I would further note the point that overloading a page with information is  
shown not to increase, but to decrease its accessibility for many people.

>   * More authors will provide text alternatives if they can hide them,  
> and this is worth the accuracy cost of helping them to do so.

Indeed.

>   * User agents could excude bad data.

User agents can (trivially easily - it takes much less time than has been  
spent on this current thread) take the most common form of bad data - a  
description instead of a URL in the attribute value, and make it available  
to the user. It is also trivially easy to find this fault in validation  
tools, authoring environments, content management systems, etc.

>   * Users will not give up on @longdesc if they repeatedly encounter bad
>     @longdesc values.
>
>   * User agents will not stop implementing @longdesc if users repeatedly
>     encounter bad @longdesc values.

These arguments rest on either common sense (which I don't trust), or  
significant amounts of study with large numbers of users and clear  
discussions with user agent developers. My sense is that the first will  
prove true in line with my experience, and the second will depend in large  
part on the outcome of this issue in the HTML WG.

>   * Better implementations will make it easier to discover @longdesc  
> visually, reducing the error rate due to it being hidden from the
> normal rendering of the page itself.

This follows as a necessary consequence of the hidden metadata argument  
itself.

> But instead, the "Hidden Metadata Fallacy" section lays seige to a  
> nonsenical strawman argument asserting that hidden metadata is not
> discoverable at all.
>
>> i, for one, think there is a crucial distinction between "hidden  
>> metadata" and "discoverable metadata"
>
> What's the distinction?

Actually, hidden is not a binary state, it depends on usage. This subtlety  
seems to have been lost to the "anti-longdescers", and is the principle  
which Gregory is trying to elucidate.

I will try to propose a new text for the section tomorrow which clarifies  
these arguments.

>> -- one could call "hidden" any metadata attached to an image (such as a  
>> JPEG file) which contains info about the camera, the resolution, and
>> other technical metadata because when the image is rendered, such
>> metadata is not immediately available to the user unless that metadata
>> is extracted or exposed using a specialized tool -- this is the entire
>> point of such initiatives as the RDF in Photo work within the W3C and
>> some of the other initiatives that laura has documented...
>
> In the context of a webpage, those are all hidden metadata.

No, some of them are presented (e.g. Flickr extracts them sometimes). But  
I don't think the argument is very strong (although it is valuable enough  
to explain), because those metadata are automatically generatable, and  
while they are indeed subject to quality degradation (e.g. a photo  
manipulation tool forgets to update them as approriate) they are  
relatively easily maintained.

Longdesc, by virtue of being difficult or impossible to generate by  
machine in most instances, is vulnerable to quality degradation whenever  
it is not getting attention from actual humans.

There is lots of work on ways to support quality maintainance  
automatically, and it isn't all complicated. But it is a qualitatively  
different job.

Cheers

Chaals

-- 
Charles McCathieNevile  Opera Software, Standards Group
     je parle français -- hablo español -- jeg lærer norsk
http://my.opera.com/chaals       Try Opera: http://www.opera.com
Received on Saturday, 7 May 2011 11:19:03 UTC