Re: ISSUE-66: image analysis heuristics - suggest closing on 2009-09-03 from Shawn Medero on 2009-08-21 (public-html@w3.org from August 2009)

From: Shawn Medero <smedero@uw.edu>
Date: Fri, 21 Aug 2009 12:07:06 -0700
To: Maciej Stachowiak <mjs@apple.com>
Cc: "public-html@w3.org WG" <public-html@w3.org>, Matt May <mattmay@adobe.com>
Message-ID: <b0c61c410908211207r414d5cb0rd7293f694d7ab9e0@mail.gmail.com>

On Thu, Aug 20, 2009 at 10:45 PM, Maciej Stachowiak<mjs@apple.com> wrote:

> Further, I believe the premise of the objection is false. The objection
> categorically says that state-of-the-art image analysis heuristics cannot
> recover useful information from an image, "not even close". There exist
> optical character recognition algorithms that could recover text from an
> image of text with high probability of success.

Wearing my "former employee for a Linguistics research lab" hat, I'm
going to point out that OCR of a digital image containing
Arabic/Bengali text is no where near ready for prime time. Read some
of the system evaluations found in academic literature via Citeseer or
Google Scholar. Just to be clear, I'm not even talking about
handwritten Arabic... just OCR of digital images containing text
written in popular Arabic fonts is not stable enough for commercial
use. The products claiming to do it only work with one or two fonts
and require very clean image sources. In the US, only defense
contractors have access to complex systems that can perform (in terms
of accuracy and speed) reasonably well on this task.

There's a lot of interesting research in this field... but getting to
the heart of Matt's point it is not ready for a spec like HTML 5.

----

One question I have is what is "image analysis heuristics" really
saying in this section:

"User agents may also apply image analysis heuristics to help the user
make sense of the image when the user is unable to make direct use of
the image, e.g. due to a visual disability or because they are using a
text terminal with no graphics capabilities."

Was it really referring to OCR-type tasks or something else? I can't
imagine it was referring to something like OCR or any type of machine
transcription from an image source ... so I'd rather not waste a
permathread on what similar technologies can and can't do.

It would also be helpful to know if that text added based on behavior
found in deployed implementations.

-s

Received on Friday, 21 August 2009 19:07:44 UTC