- From: Matt May <mattmay@adobe.com>
- Date: Fri, 21 Aug 2009 09:39:06 -0700
- To: Maciej Stachowiak <mjs@apple.com>, "public-html@w3.org WG" <public-html@w3.org>
I object. On 8/21/09 12:45 AM, "Maciej Stachowiak" <mjs@apple.com> wrote: > This issue raises an objection that is editorial in nature, and I do > not believe it will have a material effect on normative requirements. I disagree. The image analysis statement: "User agents may also apply image analysis heuristics to help the user make sense of the image when the user is unable to make direct use of the image, e.g. due to a visual disability or because they are using a text terminal with no graphics capabilities." ...is presented in the context of a segment on @alt which, as should be clear to everyone by now, is highly contentious. Taken with the guidance on missing @alt, it suggests that authors can rely on browser technology to repair semantics they have left out for whatever reason--and therefore, @alt is not as necessary as before. It is a dangerous juxtaposition, particularly to an outside observer. > Further, I believe the premise of the objection is false. The > objection categorically says that state-of-the-art image analysis > heuristics cannot recover useful information from an image, "not even > close". I stand by that remark. > There exist optical character recognition algorithms that > could recover text from an image of text with high probability of > success. OCR is achievable, and has been for years. If the sentence read "User agents may also apply image analysis heuristics for OCR," then I'd be in favor of that. One tool that I know of, WebVisum, does that today. But instead it says heuristics could be used "to make sense of the image," which is still a pipe dream. > There are also image analysis algorithms that can detect > specific features with fairly good accuracy. For references see > <http://en.wikipedia.org/wiki/Machine_vision The successes in machine vision (as described in the Wikipedia article) have been in pattern-matching analysis of highly-constrained objects for measurement or imperfections, as used for quality control in manufacturing. The entry also says: "One should not confuse machine vision and computer vision. Computer vision is more general (in the solution of visual problems), whereas machine vision is an engineering discipline mainly concerned with industrial problems." So let's look at the Wikipedia entry for computer vision: http://en.wikipedia.org/wiki/Computer_vision In the section titled "State of the art", the article itself acknowledges the significant limitations of general-purpose image analysis: "[T]here is no standard formulation of how computer vision problems should be solved. Instead, there exists an abundance of methods for solving various well-defined computer vision tasks, where the methods often are very task specific and seldom can be generalized over a wide range of applications." Adobe employs many, many people who have expertise in this work, and who keep pace with (or advance) the state of the art. I have talked with a number of them personally. None of them have any confidence that general-purpose image analysis heuristics done in-browser with the level of detail that is required for meaningful alt text could be done in less than 10 years of sustained R&D. Even then, the author's intent, which is the best indicator of what alt text should be, is not a part of the equation, so this should be considered a last-ditch repair attempt at best. > I believe the current state of research is beyond what is descried > in that article and at the links. HTML5 is not a research document, it's a specification. And in a specification which relies so heavily on empirical fact and the current state of browser technology, a pie-in-the-sky statement like this is even more incongruous. To date, no one, including the editor, has offered any evidence that the approach he suggests is achievable now or in the foreseeable future, much less applicable to users who can't see. And no browser vendor has even hinted that they may be interested in pursuing image analysis. I would assume that if one were working on it, they would correct me publicly. Given that some features of the language that have actually been implemented in one form or another have been removed due to insufficient implementation, I have to wonder why a passage that specifies something no one has done or plans to do (or can do with state-of-the-art technology) should stay in. If a browser developer sees value in image analysis heuristics at any point in the future, there is nothing that prevents them from implementing them. Taking that paragraph out would not impact that. However, leaving it in would lead readers to believe that this is actually a viable approach. It is not. - m
Received on Friday, 21 August 2009 16:39:49 UTC