Re: alt crazyness (Re: alt and authoring practices) from Smylers on 2008-05-03 (public-html@w3.org from May 2008)

From: Smylers <Smylers@stripey.com>
Date: Sat, 3 May 2008 14:44:02 +0100
To: "public-html@w3.org" <public-html@w3.org>
Message-ID: <20080503134402.GC23967@stripey.com>
Olivier GENDRIN writes:

> On Sat, May 3, 2008 at 11:11 AM, Smylers <Smylers@stripey.com> wrote:
> 
> > Daniel Glazman writes:
> > 
> > > 1. making alt optional in HTML 5 is ridiculous
> >
> > I don't think that's really an argument.  But if it is then I'm
> > going to rebut it with:
> >
> >   Making alt compulsory in all circumstances is ridiculous.
> >
> > In particular, it doesn't make sense to mandate that the HTML author
> > provides alt text for an image she doesn't know what it is.
> 
> As it doesn't make sens to mandate the HTML author will respect the
> rest of the spec...

True.  But in all _other_ cases where a webpage is invalid we can state
how the author should have marked up the content available to her to
produce something that is valid.

Whereas in the case where the author doesn't know the image she's
including there's nothing she can do.  Even an author who wants to meet
the spec wouldn't be able to do so if it mandated unknown alt text.

> We could also make conformance optional, and in fact, it is optional,
> as far UA have non-conformance handling process.

There are some spec violations which the spec defines how browsers
should handle.  There are other spec violations which produce
syntactically valid HTML (such as using <h1> just to make text bigger,
or using alt="" on an important image) which a browser won't correct.

> >  > 3. when I read something like "When the alt attribute is missing,
> >  >    the image represents a key part of the content. Non-visual
> >  >    user agents should apply image analysis heuristics to help the
> >  >    user make sense of the image.", I can't believe my eyes...
> >
> >  Why?  That sounds entirely plausible to me.
> 
> Because if the author is not aware of @alt, he won't use it for
> content images nor for illustration images.

That's true.  An author can't use something he isn't aware of.  But I
fail to see how an author could possibly read the part of the spec which
explains the narrow circumstances in which it is accepted that no
plausible alt attribute can be provided without being aware of the alt
attribute.

Clearly authors in ignorance of the spec are likely not to meet its
requirements.

> So the image is more likely not to be a 'key part of the content'.

OK.  Consider how a text-only browser should treat each of the
following:

* an image for which it has no alt text but which it knows to be
  significant content

* an image for which it has no alt text and which might be significant
  content or might not

What would you suggest it do differently with the second?  Given that it
might be an instance of the first it can't ignore it entirely.

> And if image analysis heuristics was performant, use of CAPTCHA will
> be abandonned.

That does appear to be happening:

  http://www.theregister.co.uk/2008/04/14/msn_captcha_breaking/

It might be that the only heuristic is to say '[image]', or give its
filename or dimensions.

In this, unfortunate, situation where we have to synthesize alternative
content without a human seeing the image there are three places where it
could be done:

  1 It could be left up to HTML authors to come up with _something_; the
    spec says it doesn't matter what, so long as there is an
    alternative.

  2 The spec could mandate a formula for specific alt text, such as
    "[unknown image 000_0372.JPG 816 x 616 px]".

  3 It could be left up to developers of user-agents that don't display
    images to work out the most appropriate behaviour for their users
    (possibly providing configuration options for users to pick for
    themselves).

Option 2 is a poor choice because it doesn't keep pace with technology,
and it prevents browser developers from innovating better ways of
synthesizing unknown alt text.

Option 3 is superior to option 1 because there is more incentive for
developers (and users) to get this right than there is for authors to do
so; the decision is being made by those who compete on this stuff, whose
vested interest is in creating the best possible user experience for
those without images.

> And the content would be tainted by the result of the image analysis.
> A single image can have thousands of meanings, which one will choose
> the image analysis? Will it have also to analyse the context to guess
> a probable meaning ?

Sure, that's unfortunate.  But we're in a situation where that data
simply doesn't exist; it's far from ideal, but it happens.  The HTML 5
spec can't magic that data out of nowhere, so heuristics are the best
that can be done.

The only questions are where's the best place to perform those
heuristics, and should pages which require such heuristics be deemed
valid webpages/

> > > 4. basing the spec'd definition of alt on common practice on the
> > >    web is crazy, absolutely crazy.
> >
> > I agree that would be a poor choice, since alt is so often used
> > badly (or omitted when it should be provided).  But I don't think
> > HTML 5 _is_ doing that.  Many existing web pages won't be valid HTML
> > 5 specifically because they _don't_ provide alt text.
> 
> Are we writing the spec to make 75% of the existing tagsoup webpage
> conformant ?

I don't understand how your question relates to the quoted text above
it; I tried to say that I think HTML 5 (as currently written, with alt
being omitted in a few defined cases) will deem many existing webpages
to be invalid, specifically because of missing alt text.  Widening the
scope of HTML to include the x case of serving unknown images has no
impact on the squillions of pages which currently omit alt text for
other reasons.

However, looking at other differences between HTML 5 and HTML 4 (not
related to alt text), there does seem to be a move to make more tag-soup
pages valid HTML 5 than are valid HTML 4.  This seems to be for syntax
which is unambiguous and which browsers have to interpret interoperably
anyway (and in many cases, already do) -- situations in which changing
the syntax to that demanded by HTML 4 will have no practical difference
in how the page is interpreted; the only reason for doing so would be to
obey the standard.

But that's circular: if the standard drops the requirement then there's
no reason at all to do it.  I reckon it's a definite improvement to
focus efforts on asking authors to change things which _do_ make a
difference, rather than just on hoop-jumping.

That so many existing pages are tag-soup suggests that conforming with
HTML 4 was too hard (authors tried and failed), or too unnatural (its
demands were counter to authors' instincts), or unnecessary for
interopability (authors got the required output without pandering to
them).  

Smylers
Received on Saturday, 3 May 2008 13:44:58 UTC