RE: Why I don't attend the weekly teleconference (Was: Input on the agenda) from Ian Hickson on 2009-06-29 (public-html@w3.org from June 2009)

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 29 Jun 2009 23:20:30 +0000 (UTC)
To: Murray Maloney <murray@muzmo.com>
Cc: public-html@w3.org
Message-ID: <Pine.LNX.4.62.0906292205110.16244@hixie.dreamhostps.com>
On Mon, 29 Jun 2009, Murray Maloney wrote:
> At 09:01 PM 6/28/2009 +0000, Ian Hickson wrote:
> > On Sun, 28 Jun 2009, Murray Maloney wrote:
> > >
> > > I think that the public is largely stuck with whatever the browser 
> > > makers do.
> > 
> > In that case, my original statement stands. If we want to make the 
> > spec actually match what is implemented, and not be an especially dry 
> > work of science fiction, we have to write what the implementors want 
> > to implement. They do in fact have ultimate veto on the parts they 
> > implement.
> 
> Well, just because some/most/all browser implementations are not using a 
> few attributes does not make those attributes science fiction.

This isn't about browser implementations necessarily; it's about whatever 
implementations are relevant to the feature. In the case of "axis", 
"longdesc", or "summary", for instance, it might be ATs rather than 
browser vendors. In the case of "h1", it might be browsers, ATs, and 
search engines. In the case of "itemprop", it might be primarily data 
mining tools. Some requirements in the spec are only relevant to validator 
implementors.

However, if the relevant audience of implementors doesn't implement a 
feature, then I don't think it's hyperbole to call it science fiction. For 
instance, HTML4's <object declare> feature is not supported by any of the 
major browser vendors, any of the search engines, and any of the ATs. It 
isn't a feature that Web authors can use and actually have third-party 
software support, because nobody in fact supports it. It is, IMHO, science 
fiction.

My goal in writing the spec is to not have any features that are ignored 
by the relevant implementors (ATs, search engines, data mining tools, 
browsers, conformance checkers, whoever the particular requirement applies 
to). If the relevant implementors ignore the feature, then yes, it is 
science fiction.


> It is true that browsers can chose to ignore features within HTML. That 
> does not render the features obsolete to other kinds of tools, or user 
> agents. Can we agree on this much?

Absolutely.


> Can you agree that browsers are not the only viewports onto HTML?

Of course. The HTML5 spec lists six conformance classes explicitly, and 
these further break down into many more types of implementations.


> If the universe could provide you with evidence that there is sufficient
> useful data extant to justify the existence of longdesc and summary, how much
> data would that have to be?

I don't know, it's a judgement call.

In the case of summary="" and longdesc="", it's not such much the 
existence of good data that matters, so much as the fraction of the total 
data that is good. (Psychological studies regarding what fraction of a 
user's experiences can be bad before the user stops being willing to risk 
the bad experiences in the hope of a good one would be helpful in guiding 
us here.) Also, if the bad data can be algorithmically filtered, then that 
makes the barrier lower.

The amount of bogus data in alt="" attributes is still high, but it's low 
enough for the attribute to be useful still. So that's probably the kind 
of bar we're looking at. I don't recall offhand exactly what the numbers 
are for alt="" usefulness, but it's still pretty low. (Low double digits 
percentage of the total number of images with alt="" at all? I forget.)


> If the Wall Street Journal and its sister news providers began providing 
> all of its feeds with useful AT metadata, would that tip the scale? What 
> if several state/national education systems were to make their curricula 
> to their students available with useful AT metadata? What if state/ 
> federal financial reports employed [...]

If even a single one could do this (organically, i.e. not just because 
someone interested in the outcome of this discussion convinced them to do 
it), and did so in a way that showed that summary="" data was better off 
hidden from non-AT users, that would certainly be significant.


> There's more than one way to skin a cat, and if all it takes is to get 
> somebody to turn on a bit of XSLT and populate a few web sites, then 
> maybe you can get your data and everybody will be happy. So, seriously, 
> how much data from a legitimate publisher would warrant a reversal of 
> your position.

Any data at all showing summary="" or longdesc="" being organically used 
in a useful manner specific to ATs and not other users would be 
significant. (So far, the examples of summary="" being used organically in 
a positive way have actually been cases where the text in the sumamry="" 
attribute would have been useful to non-AT users also.)


> > > Moreover, the proponents of both summary and longdesc disagree with 
> > > your assessment.
> > 
> > Disagreeing by assertion the results of objective studies isn't "fair" 
> > either. I could assert that the financial markets have done nothing 
> > but grow in the past three years, but presumably you would dismiss 
> > such statements as groundless. This is, IMHO, no different.
> 
> Well, it's a bit different. Verifiable data about the financial markets 
> is available to all and a plethora of pundits and analysts are ever 
> ready to pronounce myriad opinions on the meaning of almost every data 
> point.

Verifiable data about the state of the Web is also available to all. 
Philip` didn't use a big datacenter to analyse pages to obtain the data he 
collected; he did it on his own machine.


> With these AT attributes, the accessibility community is trying to 
> educate publishers and it is taking a long time.

I think fundamentally that approaches to accessibility that rely on 
education are basically doomed. We need to have accessibility be much more 
automatic than that. We need to make it easier to write accessible pages 
than to not do so, even for people who don't care about accessibility. 
This is why, for instance, we have separation of presentation from 
semantics as such a core feature in HTML5 (and HTML4) -- it's not called 
out as an accessibility feature, but it gets authors into the mindset of 
thinking of what they mean, not what they want it to look like, and that 
helps AT users.

When we _do_ still need to rely on education, IMHO we should do so in a 
way that leads to really simple rules.

Instead of:

   "Describe the structure of your table in a few sentences in the 
   summary="" attribute."

...have:

   "All tables should be explained in their <caption>."

This then helps both AT users _and_ users with cognitive difficulties 
_and_ users who aren't familiar with the subject matter _and_ is done in a 
way that is immediately verifiable by the author.

This seems to me to be a net win.

Similarly, instead of:

   "For important images, add a longdesc="" attribute with a link to a 
   page that describes the image."

...have:

   "Make sure important images are described in the prose."

...or:

   "For important images, add a link to a page that described the image."

This way authors don't have to learn a new technique (longdesc=""), they 
can just continue using the techniques they use every day, like <a 
href="">. This leads to the information being available to everyone, not 
just ATs, _and_ leads to the author _seeing_ the information and thus 
increases the likelihood that the information will be reviewed.


> [Frankly, the browser vendors could win a lot of good press by stepping 
> up with the big publishers and provide better accessibility for 
> everybody. A well-written table summary could help a lot of people, 
> especially in financial reports.]

I agree entirely. We should make these summaries available to everyone, 
though, not just AT users.


> > > I could agree that the publishing market has not yet adopted these 
> > > features as fully as the AT market and its supporters would have 
> > > liked.
> > 
> > The problem isn't so much lack of adoption so much as the 
> > overwhelmingly incorrect use of the features when they _are_ used.
> 
> Which relative percentages could be overcome tomorrow if the right 
> publishers flip a switch.

Agreed. But will that happen?


> But the technology has not matured due to a social problem, not a 
> technical failure.

If anything, that makes it worse -- social problems are far harder for us 
to fix than technical failures.

We can't just ignore the social problems, we need to route around them.


> > Consider another attribute, like "axis". This is an attribute intended 
> > for accessibility purposes, just like "summary". Is _it_ mature? 
> > Should we keep it? Drop it? Why?
> 
> Well, that's not fair either. :-) Axix/axes happens to be a favorite of 
> mine and was the subject of a chapter in an SGML book I completed for 
> Yuri Rubinsky in 1997. If browsers today were able to process axis/axes 
> and its use were adopted more widely it would aid the comprehension of 
> tables.

If. :-)


> I would keep it in because it costs you nothing to include a feature 
> that you do not expect many/any browsers to implement.

Every feature has a cost, e.g.:

 - documentation in the spec
 - writing of test cases
 - review of test cases
 - tutorials
 - time spent by authors determining if the feature can be used or not

We shouldn't ever make the mistake of assuming a feature costs nothing.


> If/when they do, users of such tables would benefit immensely.

If/when an implementation wants to have this data, then we can add it to 
the spec. In the meantime, if we have the feature but there are no 
implementations, the data in the attribute is just going to be bogus 
(because anything that doesn't get tested is much more likely to have 
unchecked and thus undiscovered errors).

If we keep the attribute but don't implement it, we'll never be able to 
implement it because the data will be poluted. This is, in fact, exactly 
what happened with summary="" and longdesc="" (except they did get 
implemented, just not in the tools that the authors used to test their 
pages).


> Until then, other processing tools would still be able to read and write 
> axis/axes values for their own purposes.

We have several features intended for UAs to use for their own purposes
without needing dedicated attributes (data-*="", microdata, etc).


> Can you agree that longdesc and summary are not in themselves faulty and 
> that the real problem is a social problem related to lack of useable 
> data.

Sure. The end result is the same, though.


> > Unfortunately, it has been demonstrated that this particular approach 
> > doesn't work in wide deployment on the Web, because of the small 
> > fraction of the authoring base who specify these attributes, a large 
> > proportion specify useless values that are hard or impossible to 
> > programatically distinguish from useful values, and thus these 
> > attributes in fact end up _not_ being easy for an application to 
> > ignore unless they ignore them wholesale (at which point the value of 
> > the attribute is lost, and we would be doing authors a favour by 
> > letting them know that providing the attribute at all is a waste of 
> > time).
> 
> But that argument will apply to whatever solution is proffered, won't 
> it.

It doesn't seem to apply to the proposed <caption> solution, since that 
would get seen by authors and thus reviewed, and thus not contain bogus 
data in anyway near as many cases, and thus wouldn't need to be ignored 
in the first place.


> We can never be sure that a text input attribute will contain the right 
> information unless we so constrain the attribute as make it unusable as 
> a general text container.

As you say, it's a social problem. We can dramatically increase the odds 
of the data being not bogus by making it visible to authors.

Anecdotally (I haven't got precise numbers to give on this, but it's based 
on the random studies I've done looking at the Web), text in attributes 
that aren't visible to authors at all (longdesc="", summary="") is largely 
bogus, text in attributes and features that are visible to authors if they 
go out of their way (title="", <title>) is bogus some of the time and 
useful some of the time, and text that is always visible (<h1>, <p>) is 
usually useful.

I would hypothesise that there is a direct correlation between the quality 
of data and the extent to which it is visible to the page's maintainer.

(I'm not the first or only one to suggest this; the idea that hidden 
metadata is usually inaccurate underlies a lot of the Microformats work, 
for instance.)


> > > Again, I am not suggesting enforcing anything. Rather, I am 
> > > suggesting enabling and empowering AT implementors.
> > 
> > I am suggesting they have ultimate power already. :-) If we specify 
> > something and they decide it's worthless, they're not going to 
> > implement it. (Witness the "axis" attribute in HTML4, for instance.)
> 
> Again, just because the browser doesn't use something, doesn't mean 
> nobody does.

I wasn't talking about browser vendors here, but about AT vendors.


> > > It is true that those attributes will be misused on some/many/most 
> > > HTML pages, just as other HTML attributes are often misused. But 
> > > that doesn't mean that it won't be useful when it is.
> > 
> > Actually, that's exactly what it means. When the overwhelming majority 
> > of the data is bogus, you cannot know when it is not, and thus even 
> > good values become useless.
> 
> You assert that I cannot know. But there are ways that I can know that a 
> given publisher, perhaps the one through whom I receive my Reader's 
> Digest, is providing me with useful data.

Granted, you could know from experience that a site has good data, and you 
might be tempted to check based on the claims of someone you trust. But as 
a general rule, when you go to a random Web page, you don't know, and 
can't know, and more importantly the user agent has no way to know and 
thus can't do anything on behalf of the user (e.g. automatically chosing 
whether or not to use the available data).


> Or perhaps I could use profile="http://www.at-enabled.org" (fictional) 
> to specify that I am promising to provide useful metadata. So it is 
> possible to place a seal of approval on a document.

If we had such a seal, it would be used by lots of people who didn't 
actually do the right thing. So it wouldn't actually tell you the quality 
of the aforementioned attributes.

For example, the HTML4 DOCTYPEs are used by people who don't follow the 
relevant DTDs an order of magnitude more often than by people who _do_ 
follow the relevant DTDs.


> > > That may not seem like a very satisfying engineering solution, and 
> > > it isn't. But so what? If it only helps a few people to read a good 
> > > book or a newspaper or their company newsletter, then haven't we 
> > > made the world a better place.
> > 
> > It's not that >99% of the relevant _users_ can't use these attributes 
> > and <1% of the relevant _users_ can use these attributes. It's that 
> > 100% of the users will find them useless >99% of the time, and they 
> > have no way to know ahead of time which the <1% of the cases are, and 
> > therefore they will act as if the attribute is useless 100% of the 
> > time.
> 
> No I won't. I will point at sites that I know I can read. I will be 
> disappointed that I can't read everything that my neighbour can read, 
> but my life will have improved, if only slightly. And that's just me. 

I want better than this. I want the blind user to be able to read all of 
the same data I can read.

I think we should be aiming for solutions that have a chance of improving 
the experience for AT users across more sites than the current solutions.


> Imagine how good a blind person will feel that he/she has access to an 
> ever growing corpus of useful data.

Imagine how good a blind person will feel when the rate of growth is twice 
what it is today!

(I don't think this line of argumentation is particularly useful.)


> > Furthermore, with both longdesc="" and summary="" there are ways to 
> > make the data available that don't in fact have any of these problems. 
> > You can provide an explicit link visible to all, and you can provide a 
> > summary of the table visible to all. These solutions would have 
> > significantly less bogus data (because the authors would see them), 
> > and so users would know that they are likely to be useful. It also 
> > provides these useful descriptions to all users (universal access), 
> > thus benefitting even users who might make use of such help despite 
> > being sighted (e.g. users with cognitive difficulties might find an 
> > introduction to a complex table useful despite being able to read the 
> > page fine, and in fact even "normal" users would sometimes find such 
> > help useful, as was demonstrated on some pages discussed a few weeks 
> > ago).
> 
> Well, I understand that we could use links. But I also know from 
> experience that authors are somewhat disinclined to make the effort to 
> create an ever increasing number of files which need to be managed 
> themselves, not to mention the task of managing the links.

How is that different to longdesc=""?


> I would be much happier to hear that HTML 5 would require that 
> stylesheet and programming content had to be linked to the document 
> rather than included in the HTML file. Moreover, I would like to option 
> to disable fetching the programs so that my pages would arrive sooner 
> and stop stealing cycles.

I don't understand what you are proposing here, or the relevance to this 
thread.


> > > And at what cost? Some HTML attributes that most browsers will 
> > > ignore and some will support.
> > 
> > The cost of these attributes is that people who _do_ want to help 
> > authors will spend time writing help text that will be ignored by many 
> > of their users. Instead of improving accessibility in ways that 
> > actually improve accessibility to many users, authors will think they 
> > have improved their site's accessibility while in fact having done 
> > little to truly help users.
> 
> I don't agree with your conclusion. It is a logical leap that is 
> unfounded.
> 
> As I have written, when a site publishes the fact that they are 
> employing AT attributes properly and the community discovers that it is 
> true, the users of that site will benefit. The fact that other sites do 
> not will not prevent me from reading a site that does.

I don't know about you, but this doesn't describe how I browse the Web. I 
don't limit myself to a few sites that have a community that I belong to. 
I go to hundreds of random unrelated sites each day, following links on 
news aggregators, e-mails, blogs, social networks, etc. In particular 
sites where I deal with images and tables are not sites I visit regularly. 
There are no sites with many complex tables that I visit regularly [1] -- 
I might go to one government site today, and another government's site 
tomorrow, and they might have complex tables, but I have no way to know 
before I go there whether the site's claims of having written good 
summary="" attributes is true or not, and I have no way to connect with, 
or any interesting in connecting with, the communities around those sites.

([1] with one exception, wikipedia, but even if it had summaries, they 
would vary in quality from page to page)


> > Incidentally, expertise is not a license to skip reasoned arguments 
> > and research. There isn't anything special about experts in the HTML5 
> > development process; the only difference between an expert and a 
> > non-expert is that the experts will have an easier time explaining 
> > their arguments and obtaining supporting data. Experts have more 
> > influence than non-experts not because they are labeled "expert", but 
> > because they are more convincing in their arguments, they are more 
> > adept at finding relevant research, they have relationships with 
> > people who can provide corroborating evidence, and so forth.
> 
> Some people would say, and have said, that your own reasoning is flawed.
> 
> I agree that everyone should have a chance to make their case, and that 
> an "expert" does not get to make pronouncements without presenting their 
> arguments. I will note again that Alan Greenspan was not only considered 
> to be a world-class expert, he also testified regularly before congress 
> to present his arguments. But he was wrong. Not because he was 
> irrational, but because he neglected evidence that was being presented 
> by people who disagreed with his views. We, collectively, made a mistake 
> by accepting his wisdom.

Sure, even experts can be wrong. All the more reason to ensure that their 
opinions are defended by reasoned argument and data, just like everyone 
else's opinions.


> Today, I think that you are neglecting evidence, both technical and 
> social, as pertains to various parts of the HTML 5 specification. I keep 
> trying to figure out how to present the case in a new way so that you 
> can see what is so obvious to me and others, but I haven't figured it 
> out.

I feel the same way in reverse. :-)


> > > But we are being asked to patient while a small community of 
> > > under-funded associations and companies build the critical mass of 
> > > technology and content to enable people to read or listen to an 
> > > increasing volume of useful content.
> > 
> > It should be noted that the proposals I have put forward for HTML5 
> > actually help with this, because they make it unnecessary for the AT 
> > vendors to expend any resources specifically on the issues of 
> > longdesc="" and summary="" (a big effort, especially considering the 
> > difficult problem of distinguishing useful values from useless 
> > values), allowing them instead to focus on features used on many 
> > pages, like links and captions.
> 
> Actually, your suggestion to conflate caption and longdesc is entirely 
> wrong from my perspective as a writer, typesetter and publisher.

(I think you are confusing longdesc="" and summary="" here. longdesc="" 
takes a URL to an external file describing an image, summary="" is an 
inline textual explanation of the table.)

I think that you are right, in that print works typically don't have 
summaries at all, and thus typographically it is odd to see it the more 
elaborate captions that this would lead to.

But I don't think that makes it wrong, just different. This isn't a new 
thing; the Web and computers as a whole have made many changes to 
traditional typography (even more so further east, where entire alphabets 
have been changed to adapt to computer technology).


> Regardless of potential benefits that may derive from including a long 
> description of a picture in the caption, my first priorities would be 
> for the caption to satisfy the editorial requirements of a caption. For 
> the especial benefit of unsighted readers -- and possible spillover 
> benefit to sighted readers -- I would also want to provide a long 
> description of the picture if such were likely to aid comprehension. For 
> example, under a photo of Ted Nelson receiving an award, I would include 
> a caption like "Ted Nelson receiving award ... from ...", and a long 
> description which might read "Photo shows Theodor Holm Nelson receiving 
> the ... award from ... at the World Wide Web Conference in Brisbane, 
> Australia. Mr Nelson is smiling and shaking hands with ... as he 
> receives a multi-colored didjeridoo."
>
> If I included all of that information in the caption, I would be 
> breaking editorial rules for captions.

I think what you describe would be fine. I don't really mind if it's 
within the caption or legend of the image or table that the information is 
provided; my point is just that it shouldn't be in attributes that are 
hidden to non-AT users.


> Moreover, assuming that the picture clearly shows that Mr Nelson is 
> shaking hands and receiving a didjeridoo in front of a WWW banner, then 
> sighted readers will derive no extra benefit from reading the long 
> description.

Such an image probably wouldn't have a long description anyway. For images 
of this type, equivalent text in an alt="" attribute has worked moderately 
well. Long descriptions for images aren't often needed.


> > > As far as discovering useful content on the web, I am not in a 
> > > position to comment. I am willing to believe that longdesc and 
> > > summary are not used correctly in some/many/most pages today. But if 
> > > we drop these attributes and then adopt a new approach, the AT 
> > > developers will be set back again.
> > 
> > Not if they already support the new approach, as is the case for both 
> > links and captions, which are the solutions HTML5 encourages today.
> 
> These are not acceptable to me as an editor or manager of writers. It 
> just makes my job harder.

What you describe above -- including the information inline, just not in 
the caption -- is fine also. (It can even be explicitly linked to the 
table or image using ARIA attributes.)

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 29 June 2009 23:21:11 UTC