Re: ISSUE-66: image analysis heuristics - suggest closing on 2009-09-03 from Maciej Stachowiak on 2009-08-21 (public-html@w3.org from August 2009)

From: Maciej Stachowiak <mjs@apple.com>
Date: Fri, 21 Aug 2009 12:29:55 -0700
To: Matt May <mattmay@adobe.com>
Cc: "public-html@w3.org WG" <public-html@w3.org>
Message-id: <EC11329A-6B9A-430C-B7C0-FE3D18763FB0@apple.com>
Hi Matt,

On Aug 21, 2009, at 9:39 AM, Matt May wrote:

> I object.
>
> On 8/21/09 12:45 AM, "Maciej Stachowiak" <mjs@apple.com> wrote:
>> This issue raises an objection that is editorial in nature, and I do
>> not believe it will have a material effect on normative requirements.
>
> I disagree. The image analysis statement:
>
> "User agents may also apply image analysis heuristics to help the  
> user make
> sense of the image when the user is unable to make direct use of the  
> image,
> e.g. due to a visual disability or because they are using a text  
> terminal
> with no graphics capabilities."
>
> ...is presented in the context of a segment on @alt which, as should  
> be
> clear to everyone by now, is highly contentious. Taken with the  
> guidance on
> missing @alt, it suggests that authors can rely on browser  
> technology to
> repair semantics they have left out for whatever reason--and  
> therefore, @alt
> is not as necessary as before. It is a dangerous juxtaposition,  
> particularly
> to an outside observer.

I don't think it gives that impression. The guidance on missing alt  
for content authors / producers is that they MUST provide some some  
best-effort text description via @title or a <figure> <legend>, and  
only in the case where image contents are unknown. This text certainly  
does not get you off the hook. Indeed, I think it lets authors off the  
hook *less* than the WAI's proposed "missing marker" solution.

The intent is to let browsers use whatever smarts they have available  
in this unfortunate case where the producer of alt cannot describe the  
image. Would you be satisfied if it were described in more generic  
terms?

>
>> Further, I believe the premise of the objection is false. The
>> objection categorically says that state-of-the-art image analysis
>> heuristics cannot recover useful information from an image, "not even
>> close".
>
> I stand by that remark.
>
>> There exist optical character recognition algorithms that
>> could recover text from an image of text with high probability of
>> success.
>
> OCR is achievable, and has been for years. If the sentence read  
> "User agents
> may also apply image analysis heuristics for OCR," then I'd be in  
> favor of
> that. One tool that I know of, WebVisum, does that today.
>
> But instead it says heuristics could be used "to make sense of the  
> image,"
> which is still a pipe dream.

Perhaps a phrase like "describe features of the image" might be better.

Let me give an example of what I think is feasible with the state of  
the art. Check out Google Labs Similar Images: <http://similar-images.googlelabs.com/ 
 >. Using algorithms like this against a corpus of baseline images  
labeled with keywords, it should be possible to give a rough  
description of the contents of many images. That's far short of  
"mak[ing] sense of the image" but also quite a bit more advanced than  
OCR.

> Adobe employs many, many people who have expertise in this work, and  
> who
> keep pace with (or advance) the state of the art. I have talked with a
> number of them personally. None of them have any confidence that
> general-purpose image analysis heuristics done in-browser with the  
> level of
> detail that is required for meaningful alt text could be done in  
> less than
> 10 years of sustained R&D. Even then, the author's intent, which is  
> the best
> indicator of what alt text should be, is not a part of the equation,  
> so this
> should be considered a last-ditch repair attempt at best.

That is exactly what it's considered - a last-ditch repair attempt.  I  
believe this is clear in context. I believe the motivation for this  
allowance is to align the ATAG2 position that last-ditch repair for  
missing alt is best done by the user agent, not the authoring tool.

>
>> I believe the current state of research is beyond what is descried
>> in that article and at the links.
>
> HTML5 is not a research document, it's a specification. And in a
> specification which relies so heavily on empirical fact and the  
> current
> state of browser technology, a pie-in-the-sky statement like this is  
> even
> more incongruous.

[...]

> If a browser developer sees value in image analysis heuristics at  
> any point
> in the future, there is nothing that prevents them from implementing  
> them.
> Taking that paragraph out would not impact that. However, leaving it  
> in
> would lead readers to believe that this is actually a viable  
> approach. It is
> not.

I'd like to see some statement explicitly allowing browsers to use  
whatever means they have at their disposal to provide textual  
information about an image. I think a statement that's more technology- 
neutral would be an improvement. For example, there's no explicit  
allowance for using EXIF tags, even though those could provide useful  
info about many images without the need for science fictional  
algorithms.

Regards,
Maciej
Received on Friday, 21 August 2009 19:30:37 UTC