Re: ISSUE-66: image analysis heuristics - suggest closing on 2009-09-03 from Maciej Stachowiak on 2009-08-21 (public-html@w3.org from August 2009)

From: Maciej Stachowiak <mjs@apple.com>
Date: Fri, 21 Aug 2009 12:18:55 -0700
To: Shawn Medero <smedero@uw.edu>
Cc: "public-html@w3.org WG" <public-html@w3.org>, Matt May <mattmay@adobe.com>
Message-id: <5F7AFFD2-F920-4EBE-BD52-53C83742538F@apple.com>

On Aug 21, 2009, at 12:07 PM, Shawn Medero wrote:

> On Thu, Aug 20, 2009 at 10:45 PM, Maciej Stachowiak<mjs@apple.com>  
> wrote:
>
>> Further, I believe the premise of the objection is false. The  
>> objection
>> categorically says that state-of-the-art image analysis heuristics  
>> cannot
>> recover useful information from an image, "not even close". There  
>> exist
>> optical character recognition algorithms that could recover text  
>> from an
>> image of text with high probability of success.
>
> Wearing my "former employee for a Linguistics research lab" hat, I'm
> going to point out that OCR of a digital image containing
> Arabic/Bengali text is no where near ready for prime time. Read some
> of the system evaluations found in academic literature via Citeseer or
> Google Scholar. Just to be clear, I'm not even talking about
> handwritten Arabic... just OCR of digital images containing text
> written in popular Arabic fonts is not stable enough for commercial
> use. The products claiming to do it only work with one or two fonts
> and require very clean image sources. In the US, only defense
> contractors have access to complex systems that can perform (in terms
> of accuracy and speed) reasonably well on this task.
>
> There's a lot of interesting research in this field... but getting to
> the heart of Matt's point it is not ready for a spec like HTML 5.
>
> ----
>
> One question I have is what is "image analysis heuristics" really
> saying in this section:
>
> "User agents may also apply image analysis heuristics to help the user
> make sense of the image when the user is unable to make direct use of
> the image, e.g. due to a visual disability or because they are using a
> text terminal with no graphics capabilities."
>
> Was it really referring to OCR-type tasks or something else? I can't
> imagine it was referring to something like OCR or any type of machine
> transcription from an image source ... so I'd rather not waste a
> permathread on what similar technologies can and can't do.
>
> It would also be helpful to know if that text added based on behavior
> found in deployed implementations.

Would it address anyone's concerns if HTML5 said this in a more  
generic way that is agnostic about the means used to extract info from  
an image? Something like "User agents MAY use whatever means they have  
to get additional information about an image to provide to the user."   
I think the basic idea here is to say it's ok to do something smart.

Regards,
Maciej

Received on Friday, 21 August 2009 19:19:37 UTC