Re: alt and authoring practices

Bonner, Matt (IPG) wrote:
> MB> It seems like gathering data from various sources would advance this
> MB> debate more usefully than any amount of speculation on what might be.
> 
> IH> What data would you like me to collect?
> 
> Well, the data from a web crawl that seem germane would be
> along the lines of percentages of images for the oft-mentioned
> three cases:
> 
> . have no alt attribute
> . have an alt=""
> . have an alt="(a descriptive string)"
> 
> Obviously that still gives you no sense how often the alt
> text is useful, but it's a start.  [...]

Unlike Google I have no money so there's no danger of me wasting any on 
a survey, but I've already got a sample of about 130,000 pages from the 
list on dmoz.org, so I looked at that for free.

<img> elements with no alt attribute: 1104466 (47%)
<img> elements with zero-length alt: 530687 (23%)
<img> elements with non-empty whitespace-only alt: 11943 (1%)
<img> elements with non-empty non-whitespace alt: 702702 (30%)

> Le 17 avr. 2008 à 06:59, Karl Dubost a écrit :
>> More challenging, distributions of "text", collect all the text 
>> contained in alts, sort them out, and then sees what are the text 
>> which are happening very often (I think about things like "logo" 
>> emerging, but there might be surprises).

http://philip.html5.org/data/common-alt-values.txt

(It's hard to tell much from that, since a single site with hundreds of 
pages listed on dmoz.org will significantly distort the results.)

> additional one:
>     Distribution of text lengths

http://philip.html5.org/data/alt-lengths.svg

(The longest were about 10,000 characters - 
http://www.coalitionforjustice.net (looks like actually legitimate 
alternative text) and http://www.legnotre.com/ (looks like search engine 
keyword spam) - but I cut the graph off much earlier, since very few are 
longer than ~200 characters and it makes the graph more boring.)

-- 
Philip Taylor
pjt47@cam.ac.uk

Received on Wednesday, 16 April 2008 23:47:53 UTC