Collecting of data about alt usage from James Graham on 2008-04-15 (public-html@w3.org from April 2008)

From: James Graham <jg307@cam.ac.uk>
Date: Tue, 15 Apr 2008 18:26:48 +0100
To: HTML WG <public-html@w3.org>
Message-ID: <4804E558.2010503@cam.ac.uk>
In all the recent discussion about the usage of @alt, there has been little, if 
any, non-anecdotal data presented to back up various claims. This leads me to 
believe that such data might not exist, or might not be readily accessible (i.e. 
not behind for-money subscriptions). Therefore I have been wondering about the 
feasibility of collecting such data.

It is clear to me that such data collection cannot be entirely automatic; one 
needs a qualitative assessment of the goodness of a piece of replacement text as 
a substitute for the image it is intended to provide an alternative to. I also 
believe that in order to generate a significant amount of data, the survey will 
have to be distributed; in the absence of any financial incentive, it is hard to 
imagine any one person sifting through thousands of pages and classifying the 
goodness of the alt-text on each. On the other hand, there are enough people 
with some interest in accessibility that we could probably get a reasonable 
number of pages analyzed if even a small fraction of them put in 30 minutes or so.

My idea for data collection so far is to use a firefox extension with the 
following behavior:

  - Get a URL from a central list and navigate to that page
  - Show the page in the main browser area (maybe with CSS disabled, maybe with 
all images disabled)
  - For each unique image on the page:
    - Show the image in a sidebar
    - Hide the image in the main content and replace it with its alt text 
(highlighted in some way), or communicate that the alt text is null or empty
    - Ask the user to classify the image
    - Ask the user how successfully the alt text replaces the image (more on 
this part below)
  - When the user has gone through the whole page submit their ratings, together 
with some automatically collected information about the images and the page, to 
the central server

I'm not sure exactly what information should be collected at this stage but I'm 
thinking:

(User supplied)
  - A classification for the image (photo, icon, advert, etc.)
  - A rating of the alt text as an alternative to the image (something like "No 
problem, the page doesn't loose any clarity", "Some information lost" "Page 
becomes meaningless")
(Automatically collected)
  - The dimensions of each image
  - The validity of the page
  - The performance of the page in automatic accessibility checker?
  - The presence of any well known conformance/accessibility badges?

Input on what to collect would be useful. Bear in mind that classifying a single 
image has to be very quick; if the average page has ~20 images people are 
unlikely to spend more than a few seconds on each.

I also believe that to do it in a distributed way it would be necessary to have 
ids for each user participating (though there would be no need to store more 
than just a hash). This would be used to remove certain systematic problems that 
might arise.

Anyway, I am extremely time limited at the moment so if this doesn't seem like a 
good idea, I won't bother working on it at all. If there is some interest, I 
might be able to find some time (but no promises ;) )

-- 
"Eternity's a terrible thought. I mean, where's it all going to end?"
  -- Tom Stoppard, Rosencrantz and Guildenstern are Dead
Received on Tuesday, 15 April 2008 17:27:25 UTC