- From: Ian Hickson <ian@hixie.ch>
- Date: Tue, 9 Mar 2010 10:44:44 +0000 (UTC)
- To: public-html@w3.org
SUMMARY
The spec is very vague about what image analysis techniques could be
applied to images. This change proposal suggests including more detail
about possible techniques.
RATIONALE
Currently the <img> element section mentions that UAs "may also apply
heuristics to help the user make use of the image when the user is unable
to see it", but the only suggested heuristic is OCR.
In practice, there are a host of other heuristics that could help a user
make sense of an image, and they might be useful even to users who _can_
see the image. We do all users a disservice by not being more explicit
here. Being explicit could encourage significant competition amongst user
agents, leading to a much better user experience for everyone.
Since these heuristics are in many cases already implemented and shipping,
sometimes in multiple products from multiple vendors, and since recent
advances in image recognition techniques have been fast and furious, it
seems reasonable to mention these techniques as real possibilities.
DETAILS
Strike "when the user is unable to see it". Instead, start a new sentence
before the "e.g", which says "This would be especially useful to users who
cannot see the image", and add the following after the "e.g." clauses, in
a separate clause: "but it could also be useful to users who _can_ see the
image, but might not fully understand or recognise it".
Move "optical character recognition (OCR) of text found within the image"
to be the first bullet of a bulleted list, and add the following
additional points:
* Facial recognition in photographs, especially facial recognition of
notable individuals or of individuals in the user's social network.
* Product or brand recognition in photographs or logos.
* Barcode recognition of any embedded barcodes.
* Bitmap to vector analysis for diagrams, allowing images to be
further analysed in specialised tools.
* Data extraction for graphs, allowing data to be reconstructed from
bar charts, pie charts, and the like, or allowing regression lines
to be fitted to x,y plots.
* Landmark recognition for photographs.
* 3D reconstruction of scenes based on multiple images, allowing a set
of images to be taken together and explored in context.
IMPACT
POSITIVE EFFECTS
Adding such text could lead to a renewed level of competition in browsers
as they find the best ways to expose such tools to users.
Such competition would inevitably lead to improved accessibility across
the board, as many of these analysis techniques could provide users with
anything from a basic hint of the image's contents to fully-interactive
reconstructions of the image in more accessible forms (especially in the
case of text-in-image or graphs).
NEGATIVE EFFECTS
Makes the spec longer.
CONFORMANCE CLASS CHANGES
None.
RISKS
It is suggested that mentioning that user agents might be able to repair
non-conforming pages could make authors less likely to write conforming
pages, though it is not clear why this would apply here and not in the
many other parts of the spec that mention repair techniques, especially
the sections that explicitly mandate specific user agent repair
techniques.
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 9 March 2010 10:45:14 UTC