W3C home > Mailing lists > Public > public-html@w3.org > April 2008

Re: alt and authoring practices

From: Philip Taylor <pjt47@cam.ac.uk>
Date: Thu, 17 Apr 2008 00:44:58 +0100
Message-ID: <48068F7A.9060609@cam.ac.uk>
To: "Bonner, Matt (IPG)" <matt.bonner@hp.com>
CC: "public-html@w3.org" <public-html@w3.org>, "wai-xtech@w3.org" <wai-xtech@w3.org>, "wai-liaison@w3.org" <wai-liaison@w3.org>, Karl Dubost <karl@w3.org>

Bonner, Matt (IPG) wrote:
> MB> It seems like gathering data from various sources would advance this
> MB> debate more usefully than any amount of speculation on what might be.
> 
> IH> What data would you like me to collect?
> 
> Well, the data from a web crawl that seem germane would be
> along the lines of percentages of images for the oft-mentioned
> three cases:
> 
> . have no alt attribute
> . have an alt=""
> . have an alt="(a descriptive string)"
> 
> Obviously that still gives you no sense how often the alt
> text is useful, but it's a start.  [...]

Unlike Google I have no money so there's no danger of me wasting any on 
a survey, but I've already got a sample of about 130,000 pages from the 
list on dmoz.org, so I looked at that for free.

<img> elements with no alt attribute: 1104466 (47%)
<img> elements with zero-length alt: 530687 (23%)
<img> elements with non-empty whitespace-only alt: 11943 (1%)
<img> elements with non-empty non-whitespace alt: 702702 (30%)

> Le 17 avr. 2008 à 06:59, Karl Dubost a écrit :
>> More challenging, distributions of "text", collect all the text 
>> contained in alts, sort them out, and then sees what are the text 
>> which are happening very often (I think about things like "logo" 
>> emerging, but there might be surprises).

http://philip.html5.org/data/common-alt-values.txt

(It's hard to tell much from that, since a single site with hundreds of 
pages listed on dmoz.org will significantly distort the results.)

> additional one:
>     Distribution of text lengths

http://philip.html5.org/data/alt-lengths.svg

(The longest were about 10,000 characters - 
http://www.coalitionforjustice.net (looks like actually legitimate 
alternative text) and http://www.legnotre.com/ (looks like search engine 
keyword spam) - but I cut the graph off much earlier, since very few are 
longer than ~200 characters and it makes the graph more boring.)

-- 
Philip Taylor
pjt47@cam.ac.uk
Received on Wednesday, 16 April 2008 23:47:53 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:38:54 UTC