[Fwd: Collecting of data about alt usage]

I guess people on the wai-xtech list might be interested in this idea. Please
make sure replies go to either me or public-html so that I see them :)

-------- Original Message --------
Subject: Collecting of data about alt usage
Date: Tue, 15 Apr 2008 18:26:48 +0100
From: James Graham <jg307@cam.ac.uk>
To: HTML WG <public-html@w3.org>

In all the recent discussion about the usage of @alt, there has been little, if
any, non-anecdotal data presented to back up various claims. This leads me to
believe that such data might not exist, or might not be readily accessible (i.e.
not behind for-money subscriptions). Therefore I have been wondering about the
feasibility of collecting such data.

It is clear to me that such data collection cannot be entirely automatic; one
needs a qualitative assessment of the goodness of a piece of replacement text as
a substitute for the image it is intended to provide an alternative to. I also
believe that in order to generate a significant amount of data, the survey will
have to be distributed; in the absence of any financial incentive, it is hard to
imagine any one person sifting through thousands of pages and classifying the
goodness of the alt-text on each. On the other hand, there are enough people
with some interest in accessibility that we could probably get a reasonable
number of pages analyzed if even a small fraction of them put in 30 minutes or so.

My idea for data collection so far is to use a firefox extension with the
following behavior:

  - Get a URL from a central list and navigate to that page
  - Show the page in the main browser area (maybe with CSS disabled, maybe with
all images disabled)
  - For each unique image on the page:
    - Show the image in a sidebar
    - Hide the image in the main content and replace it with its alt text
(highlighted in some way), or communicate that the alt text is null or empty
    - Ask the user to classify the image
    - Ask the user how successfully the alt text replaces the image (more on
this part below)
  - When the user has gone through the whole page submit their ratings, together
with some automatically collected information about the images and the page, to
the central server

I'm not sure exactly what information should be collected at this stage but I'm
thinking:

(User supplied)
  - A classification for the image (photo, icon, advert, etc.)
  - A rating of the alt text as an alternative to the image (something like "No
problem, the page doesn't loose any clarity", "Some information lost" "Page
becomes meaningless")
(Automatically collected)
  - The dimensions of each image
  - The validity of the page
  - The performance of the page in automatic accessibility checker?
  - The presence of any well known conformance/accessibility badges?

Input on what to collect would be useful. Bear in mind that classifying a single
image has to be very quick; if the average page has ~20 images people are
unlikely to spend more than a few seconds on each.

I also believe that to do it in a distributed way it would be necessary to have
ids for each user participating (though there would be no need to store more
than just a hash). This would be used to remove certain systematic problems that
might arise.

Anyway, I am extremely time limited at the moment so if this doesn't seem like a
good idea, I won't bother working on it at all. If there is some interest, I
might be able to find some time (but no promises ;) )

-- 
"Eternity's a terrible thought. I mean, where's it all going to end?"
  -- Tom Stoppard, Rosencrantz and Guildenstern are Dead

Received on Wednesday, 16 April 2008 11:50:56 UTC